|
20 | 20 |
|
21 | 21 | > This package is still work in progress and scientific papers on some of the novel methods are currently undergoing peer-review. If you use this package and you encounter any problem, let us know by opening relevant issues. |
22 | 22 |
|
23 | | -### New in version 0.5.0 |
| 23 | +### New in version 0.6.0 |
24 | 24 |
|
25 | | -#### Hierarchical KeyNMF |
| 25 | +#### Prompting Embedding Models |
26 | 26 |
|
27 | | -You can now subdivide topics in KeyNMF at will. |
| 27 | +KeyNMF and clustering topic models can now efficiently utilise asymmetric and instruction-finetuned embedding models. |
| 28 | +This, in combination with the right embedding model, can enhance performance significantly. |
28 | 29 |
|
29 | 30 | ```python |
30 | 31 | from turftopic import KeyNMF |
31 | | - |
32 | | -model = KeyNMF(2, top_n=15, random_state=42).fit(corpus) |
33 | | -model.hierarchy.divide_children(n_subtopics=3) |
34 | | -print(model.hierarchy) |
35 | | -``` |
36 | | - |
37 | | -``` |
38 | | -Root |
39 | | -├── windows, dos, os, disk, card, drivers, file, pc, files, microsoft |
40 | | -│ ├── 0.0: dos, file, disk, files, program, windows, disks, shareware, norton, memory |
41 | | -│ ├── 0.1: os, unix, windows, microsoft, apps, nt, ibm, ms, os2, platform |
42 | | -│ └── 0.2: card, drivers, monitor, driver, vga, ram, motherboard, cards, graphics, ati |
43 | | -└── 1: atheism, atheist, atheists, religion, christians, religious, belief, christian, god, beliefs |
44 | | -. ├── 1.0: atheism, alt, newsgroup, reading, faq, islam, questions, read, newsgroups, readers |
45 | | -. ├── 1.1: atheists, atheist, belief, theists, beliefs, religious, religion, agnostic, gods, religions |
46 | | -. └── 1.2: morality, bible, christian, christians, moral, christianity, biblical, immoral, god, religion |
| 32 | +from sentence_transformers import SentenceTransformer |
| 33 | + |
| 34 | +encoder = SentenceTransformer( |
| 35 | + "intfloat/multilingual-e5-large-instruct", |
| 36 | + prompts={ |
| 37 | + "query": "Instruct: Retrieve relevant keywords from the given document. Query: " |
| 38 | + "passage": "Passage: " |
| 39 | + }, |
| 40 | + # Make sure to set default prompt to query! |
| 41 | + default_prompt_name="query", |
| 42 | +) |
| 43 | +model = KeyNMF(10, encoder=encoder) |
47 | 44 | ``` |
48 | 45 |
|
49 | | -#### FASTopic *(Experimental)* |
50 | | - |
51 | | -You can now use [FASTopic](https://github.com/BobXWu/FASTopic) inside Turftopic. |
52 | | - |
53 | | -```python |
54 | | -from turftopic import FASTopic |
55 | | - |
56 | | -model = FASTopic(10).fit(corpus) |
57 | | -model.print_topics() |
58 | | -``` |
59 | 46 |
|
60 | 47 | ## Basics [(Documentation)](https://x-tabdeveloping.github.io/turftopic/) |
61 | 48 | [](https://colab.research.google.com/github/x-tabdeveloping/turftopic/blob/main/examples/basic_example_20newsgroups.ipynb) |
|
0 commit comments