v0.11.0
·
392 commits
to main
since this release
New in version 0.11.0: Vectorizers Module
You can now use a set of custom vectorizers for topic modeling over phrases, as well as lemmata and stems.
from turftopic import KeyNMF
from turftopic.vectorizers.spacy import NounPhraseCountVectorizer
model = KeyNMF(
n_components=10,
vectorizer=NounPhraseCountVectorizer("en_core_web_sm"),
)
model.fit(corpus)
model.print_topics()| Topic ID | Highest Ranking |
|---|---|
| ... | |
| 3 | fanaticism, theism, fanatism, all fanatism, theists, strong theism, strong atheism, fanatics, precisely some theists, all theism |
| 4 | religion foundation darwin fish bumper stickers, darwin fish, atheism, 3d plastic fish, fish symbol, atheist books, atheist organizations, negative atheism, positive atheism, atheism index |
| ... |
Turftopic now also comes with a Chinese vectorizer for easier use, as well as a generalist multilingual vectorizer.
from turftopic.vectorizers.chinese import default_chinese_vectorizer
from turftopic.vectorizers.spacy import TokenCountVectorizer
chinese_vectorizer = default_chinese_vectorizer()
arabic_vectorizer = TokenCountVectorizer("ar", remove_stopwords=True)
danish_vectorizer = TokenCountVectorizer("da", remove_stopwords=True)
...