Skip to content

Commit f7d8403

Browse files
Added NPMI to docs
1 parent 34df185 commit f7d8403

1 file changed

Lines changed: 9 additions & 0 deletions

File tree

docs/clustering.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ By and large there are two types of methods that can be used for importance esti
122122
| - | - | - | - |
123123
| `soft-c-tf-idf` *(default)* | Lexical | A c-tf-idf mehod that can interpret soft cluster assignments. | Can interpret soft cluster assignment in models like Gaussian Mixtures, less sensitive to stop words than vanilla c-tf-idf. |
124124
| `fighting-words` **(NEW)** | Lexical | Compute word importance based on cluster differences using the Fightin' Words algorithm by Monroe et al. | A theoretically motivated probabilistic model that was explicitly designed for discovering lexical differences in groups of text. See [Fightin' Words paper](https://languagelog.ldc.upenn.edu/myl/Monroe.pdf). |
125+
| `npmi` **(NEW)** | Lexical | Estimate term importance from mutual information between cluster labels and term occurrence. | Theoretically motivated, fast, and usually produces clean topics. |
125126
| `c-tf-idf` | Lexical | Compute how unique terms are in a cluster with a tf-idf style weighting scheme. This is the default in BERTopic. | Very fast, easy to understand and is not affected by cluster shape. |
126127
| `centroid` | Semantic | Word importance based on words' proximity to cluster centroid vectors. This is the default in Top2Vec. | Produces clean topics, easily interpretable. |
127128
| `linear` **(NEW, EXPERIMENTAL)** | Semantic | Project words onto the parameter vectors of a linear classifier (LDA). | Topic differences are measured in embedding space and are determined by predictive power, and are therefore accurate and clean. |
@@ -195,6 +196,14 @@ By and large there are two types of methods that can be used for importance esti
195196
model = ClusteringTopicModel(feature_importance="linear")
196197
```
197198

199+
=== "NPMI"
200+
201+
```python
202+
from turftopic import ClusteringTopicModel
203+
204+
model = ClusteringTopicModel(feature_importance="npmi")
205+
```
206+
198207

199208

200209
You can also choose to recalculate term importances with a different method after fitting the model:

0 commit comments

Comments
 (0)