Skip to content

Commit bb55d15

Browse files
Updated readme
1 parent 12402f9 commit bb55d15

1 file changed

Lines changed: 32 additions & 25 deletions

File tree

README.md

Lines changed: 32 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -20,42 +20,26 @@
2020
- Lemmatization and Stemming
2121
- Visualization with [topicwizard](https://github.com/x-tabdeveloping/topicwizard) 🖌️
2222

23+
## New in version 0.12.0: Seeded topic modeling
2324

24-
## New in version 0.11.0: Vectorizers Module
25-
26-
You can now use a set of custom vectorizers for topic modeling over **phrases**, as well as **lemmata** and **stems**.
25+
You can now specify an aspect in KeyNMF from which you want to investigate your corpus by specifying a seed phrase.
2726

2827
```python
2928
from turftopic import KeyNMF
30-
from turftopic.vectorizers.spacy import NounPhraseCountVectorizer
3129

32-
model = KeyNMF(
33-
n_components=10,
34-
vectorizer=NounPhraseCountVectorizer("en_core_web_sm"),
35-
)
30+
model = KeyNMF(5, seed_phrase="Is the death penalty moral?")
3631
model.fit(corpus)
32+
3733
model.print_topics()
3834
```
3935

4036
| Topic ID | Highest Ranking |
4137
| - | - |
42-
| | ... |
43-
| 3 | fanaticism, theism, fanatism, all fanatism, theists, strong theism, strong atheism, fanatics, precisely some theists, all theism |
44-
| 4 | religion foundation darwin fish bumper stickers, darwin fish, atheism, 3d plastic fish, fish symbol, atheist books, atheist organizations, negative atheism, positive atheism, atheism index |
45-
| | ... |
46-
47-
Turftopic now also comes with a **Chinese vectorizer** for easier use, as well as a generalist **multilingual vectorizer**.
48-
49-
```python
50-
from turftopic.vectorizers.chinese import default_chinese_vectorizer
51-
from turftopic.vectorizers.spacy import TokenCountVectorizer
52-
53-
chinese_vectorizer = default_chinese_vectorizer()
54-
arabic_vectorizer = TokenCountVectorizer("ar", remove_stopwords=True)
55-
danish_vectorizer = TokenCountVectorizer("da", remove_stopwords=True)
56-
...
57-
58-
```
38+
| 0 | morality, moral, immoral, morals, objective, morally, animals, society, species, behavior |
39+
| 1 | armenian, armenians, genocide, armenia, turkish, turks, soviet, massacre, azerbaijan, kurdish |
40+
| 2 | murder, punishment, death, innocent, penalty, kill, crime, moral, criminals, executed |
41+
| 3 | gun, guns, firearms, crime, handgun, firearm, weapons, handguns, law, criminals |
42+
| 4 | jews, israeli, israel, god, jewish, christians, sin, christian, palestinians, christianity |
5943

6044

6145
## Basics [(Documentation)](https://x-tabdeveloping.github.io/turftopic/)
@@ -179,6 +163,29 @@ model.print_topics()
179163
| 3 | Storage Technologies | disk, drive, scsi, drives, disks, floppy, ide, dos, controller, boot |
180164
| | ... |
181165

166+
### Vectorizers Module
167+
168+
You can use a set of custom vectorizers for topic modeling over **phrases**, as well as **lemmata** and **stems**.
169+
170+
```python
171+
from turftopic import KeyNMF
172+
from turftopic.vectorizers.spacy import NounPhraseCountVectorizer
173+
174+
model = KeyNMF(
175+
n_components=10,
176+
vectorizer=NounPhraseCountVectorizer("en_core_web_sm"),
177+
)
178+
model.fit(corpus)
179+
model.print_topics()
180+
```
181+
182+
| Topic ID | Highest Ranking |
183+
| - | - |
184+
| | ... |
185+
| 3 | fanaticism, theism, fanatism, all fanatism, theists, strong theism, strong atheism, fanatics, precisely some theists, all theism |
186+
| 4 | religion foundation darwin fish bumper stickers, darwin fish, atheism, 3d plastic fish, fish symbol, atheist books, atheist organizations, negative atheism, positive atheism, atheism index |
187+
| | ... |
188+
182189
### Visualization
183190

184191
Turftopic does not come with built-in visualization utilities, [topicwizard](https://github.com/x-tabdeveloping/topicwizard), an interactive topic model visualization library, is compatible with all models from Turftopic.

0 commit comments

Comments
 (0)