You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+26-31Lines changed: 26 additions & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,21 +16,42 @@
16
16
- Streamlined scikit-learn compatible API 🛠️
17
17
- Easy topic interpretation 🔍
18
18
- Automated topic naming with LLMs
19
+
- Topic modeling with keyphrases :key:
20
+
- Lemmatization and Stemming
19
21
- Visualization with [topicwizard](https://github.com/x-tabdeveloping/topicwizard) 🖌️
20
22
21
23
> This package is still work in progress and scientific papers on some of the novel methods are currently undergoing peer-review. If you use this package and you encounter any problem, let us know by opening relevant issues.
22
24
23
-
## New in version 0.11.0: Chinese Topic Modeling :cn:
25
+
## New in version 0.11.0: Vectorizers Module
24
26
25
-
You can now readily apply Turftopic models to Chinese topic modeling thanks to newly added utilities.
27
+
You can now use a set of custom vectorizers for topic modeling over **phrases**, as well as **lemmata** and **stems**.
26
28
27
-
```bash
28
-
pip install turftopic[jieba]
29
+
```python
30
+
from turftopic import KeyNMF
31
+
from turftopic.vectorizers.spacy import NounPhraseCountVectorizer
| 3 | fanaticism, theism, fanatism, all fanatism, theists, strong theism, strong atheism, fanatics, precisely some theists, all theism |
47
+
| 4 | religion foundation darwin fish bumper stickers, darwin fish, atheism, 3d plastic fish, fish symbol, atheist books, atheist organizations, negative atheism, positive atheism, atheism index |
48
+
|| ... |
49
+
50
+
Turftopic now also comes with a Chinese vectorizer for easier use.
51
+
31
52
```python
32
53
from turftopic import KeyNMF
33
-
from turftopic.chinese import default_chinese_vectorizer
54
+
from turftopic.vectorizers.chinese import default_chinese_vectorizer
34
55
35
56
model = KeyNMF(10, vectorizer=default_chinese_vectorizer(), encoder="BAAI/bge-small-zh-v1.5")
36
57
model.fit(corpus)
@@ -45,32 +66,6 @@ model.print_topics()
45
66
| 3 | 股, 下跌, 上涨, 震荡, 板块, 大盘, 股指, 涨幅, 沪, 反弹 |
46
67
|| ... |
47
68
48
-
### New in version 0.10.0: Datamapplot cluster visualization
49
-
50
-
You can interactively explore clusters using `datamapplot` directly in Turftopic!
51
-
You will first have to install `datamapplot` for this to work.
52
-
53
-
```python
54
-
from turftopic import ClusteringTopicModel
55
-
from turftopic.namers import OpenAITopicNamer
56
-
57
-
model = ClusteringTopicModel(feature_importance="centroid")
58
-
model.fit(corpus)
59
-
60
-
namer = OpenAITopicNamer("gpt-4o-mini")
61
-
model.rename_topics(namer)
62
-
63
-
fig = model.plot_clusters_datamapplot()
64
-
fig.save("clusters_visualization.html")
65
-
fig
66
-
```
67
-
> If you are not running Turftopic from a Jupyter notebook, make sure to call `fig.show()`. This will open up a new browser tab with the interactive figure.
[](https://colab.research.google.com/github/x-tabdeveloping/turftopic/blob/main/examples/basic_example_20newsgroups.ipynb)
0 commit comments