|
20 | 20 | - Lemmatization and Stemming |
21 | 21 | - Visualization with [topicwizard](https://github.com/x-tabdeveloping/topicwizard) 🖌️ |
22 | 22 |
|
| 23 | +## New in version 0.12.0: Seeded topic modeling |
23 | 24 |
|
24 | | -## New in version 0.11.0: Vectorizers Module |
25 | | - |
26 | | -You can now use a set of custom vectorizers for topic modeling over **phrases**, as well as **lemmata** and **stems**. |
| 25 | +You can now specify an aspect in KeyNMF from which you want to investigate your corpus by specifying a seed phrase. |
27 | 26 |
|
28 | 27 | ```python |
29 | 28 | from turftopic import KeyNMF |
30 | | -from turftopic.vectorizers.spacy import NounPhraseCountVectorizer |
31 | 29 |
|
32 | | -model = KeyNMF( |
33 | | - n_components=10, |
34 | | - vectorizer=NounPhraseCountVectorizer("en_core_web_sm"), |
35 | | -) |
| 30 | +model = KeyNMF(5, seed_phrase="Is the death penalty moral?") |
36 | 31 | model.fit(corpus) |
| 32 | + |
37 | 33 | model.print_topics() |
38 | 34 | ``` |
39 | 35 |
|
40 | 36 | | Topic ID | Highest Ranking | |
41 | 37 | | - | - | |
42 | | -| | ... | |
43 | | -| 3 | fanaticism, theism, fanatism, all fanatism, theists, strong theism, strong atheism, fanatics, precisely some theists, all theism | |
44 | | -| 4 | religion foundation darwin fish bumper stickers, darwin fish, atheism, 3d plastic fish, fish symbol, atheist books, atheist organizations, negative atheism, positive atheism, atheism index | |
45 | | -| | ... | |
46 | | - |
47 | | -Turftopic now also comes with a **Chinese vectorizer** for easier use, as well as a generalist **multilingual vectorizer**. |
48 | | - |
49 | | -```python |
50 | | -from turftopic.vectorizers.chinese import default_chinese_vectorizer |
51 | | -from turftopic.vectorizers.spacy import TokenCountVectorizer |
52 | | - |
53 | | -chinese_vectorizer = default_chinese_vectorizer() |
54 | | -arabic_vectorizer = TokenCountVectorizer("ar", remove_stopwords=True) |
55 | | -danish_vectorizer = TokenCountVectorizer("da", remove_stopwords=True) |
56 | | -... |
57 | | - |
58 | | -``` |
| 38 | +| 0 | morality, moral, immoral, morals, objective, morally, animals, society, species, behavior | |
| 39 | +| 1 | armenian, armenians, genocide, armenia, turkish, turks, soviet, massacre, azerbaijan, kurdish | |
| 40 | +| 2 | murder, punishment, death, innocent, penalty, kill, crime, moral, criminals, executed | |
| 41 | +| 3 | gun, guns, firearms, crime, handgun, firearm, weapons, handguns, law, criminals | |
| 42 | +| 4 | jews, israeli, israel, god, jewish, christians, sin, christian, palestinians, christianity | |
59 | 43 |
|
60 | 44 |
|
61 | 45 | ## Basics [(Documentation)](https://x-tabdeveloping.github.io/turftopic/) |
@@ -179,6 +163,29 @@ model.print_topics() |
179 | 163 | | 3 | Storage Technologies | disk, drive, scsi, drives, disks, floppy, ide, dos, controller, boot | |
180 | 164 | | | ... | |
181 | 165 |
|
| 166 | +### Vectorizers Module |
| 167 | + |
| 168 | +You can use a set of custom vectorizers for topic modeling over **phrases**, as well as **lemmata** and **stems**. |
| 169 | + |
| 170 | +```python |
| 171 | +from turftopic import KeyNMF |
| 172 | +from turftopic.vectorizers.spacy import NounPhraseCountVectorizer |
| 173 | + |
| 174 | +model = KeyNMF( |
| 175 | + n_components=10, |
| 176 | + vectorizer=NounPhraseCountVectorizer("en_core_web_sm"), |
| 177 | +) |
| 178 | +model.fit(corpus) |
| 179 | +model.print_topics() |
| 180 | +``` |
| 181 | + |
| 182 | +| Topic ID | Highest Ranking | |
| 183 | +| - | - | |
| 184 | +| | ... | |
| 185 | +| 3 | fanaticism, theism, fanatism, all fanatism, theists, strong theism, strong atheism, fanatics, precisely some theists, all theism | |
| 186 | +| 4 | religion foundation darwin fish bumper stickers, darwin fish, atheism, 3d plastic fish, fish symbol, atheist books, atheist organizations, negative atheism, positive atheism, atheism index | |
| 187 | +| | ... | |
| 188 | + |
182 | 189 | ### Visualization |
183 | 190 |
|
184 | 191 | Turftopic does not come with built-in visualization utilities, [topicwizard](https://github.com/x-tabdeveloping/topicwizard), an interactive topic model visualization library, is compatible with all models from Turftopic. |
|
0 commit comments