You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/KeyNMF.md
+42Lines changed: 42 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -299,6 +299,48 @@ print(model.hierarchy)
299
299
300
300
For a detailed tutorial on hierarchical modeling click [here](hierarchical.md).
301
301
302
+
## Cross-lingual KeyNMF
303
+
304
+
KeyNMF, by default, does not come with cross-lingual capabilities, since only words that appear in a document can be assigned to it as keywords.
305
+
We, however provide a term-matching scheme that allows you to match words across languages based on their cosine similarity in a multilingual embedding model.
306
+
307
+
This is done by:
308
+
309
+
1. Computing a similarity matrix over terms.
310
+
2. Checking, which terms have similarity over a given threshold (_0.9_ is the default)
311
+
3. Building a graph from these connections, and finding graph components.
312
+
4. Adding up term importances for terms that appear in the same component for all documents.
313
+
314
+
```python
315
+
from datasets import load_dataset
316
+
from sklearn.feature_extraction.text import CountVectorizer
Copy file name to clipboardExpand all lines: docs/model_overview.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -57,7 +57,7 @@ In general, the most balanced models are $S^3$, Clustering models with `centroid
57
57
58
58
| Model |:1234: Multiple Topics per Document |:hash: Detecting Number of Topics |:chart_with_upwards_trend: Dynamic Modeling |:evergreen_tree: Hierarchical Modeling |:star: Inference over New Documents |:globe_with_meridians: Cross-Lingual |:ocean: Online Fitting |
0 commit comments