|
| 1 | +# Seeded Topic Modeling |
| 2 | + |
| 3 | +When investigating a set of documents, you might already have an idea about what aspects you would like to explore. |
| 4 | +Some models are able to account for this by taking seed phrases or words. |
| 5 | +This is currently only possible with KeyNMF in Turftopic, but will likely be extended in the future. |
| 6 | + |
| 7 | +In [KeyNMF](../keynmf.md), you can describe the aspect, from which you want to investigate your corpus, using a free-text seed-phrase, |
| 8 | +which will then be used to only extract topics, which are relevant to your research question. |
| 9 | + |
| 10 | +In this example we investigate the 20Newsgroups corpus from three different aspects: |
| 11 | + |
| 12 | +```python |
| 13 | +from sklearn.datasets import fetch_20newsgroups |
| 14 | + |
| 15 | +from turftopic import KeyNMF |
| 16 | + |
| 17 | +corpus = fetch_20newsgroups( |
| 18 | + subset="all", |
| 19 | + remove=("headers", "footers", "quotes"), |
| 20 | +).data |
| 21 | + |
| 22 | +model = KeyNMF(5, seed_phrase="<your seed phrase>") |
| 23 | +model.fit(corpus) |
| 24 | + |
| 25 | +model.print_topics() |
| 26 | +``` |
| 27 | + |
| 28 | + |
| 29 | +=== "`'Is the death penalty moral?'`" |
| 30 | + |
| 31 | + | Topic ID | Highest Ranking | |
| 32 | + | - | - | |
| 33 | + | 0 | morality, moral, immoral, morals, objective, morally, animals, society, species, behavior | |
| 34 | + | 1 | armenian, armenians, genocide, armenia, turkish, turks, soviet, massacre, azerbaijan, kurdish | |
| 35 | + | 2 | murder, punishment, death, innocent, penalty, kill, crime, moral, criminals, executed | |
| 36 | + | 3 | gun, guns, firearms, crime, handgun, firearm, weapons, handguns, law, criminals | |
| 37 | + | 4 | jews, israeli, israel, god, jewish, christians, sin, christian, palestinians, christianity | |
| 38 | + |
| 39 | +=== "`'Evidence for the existence of god'`" |
| 40 | + |
| 41 | + | Topic ID | Highest Ranking | |
| 42 | + | - | - | |
| 43 | + | 0 | atheist, atheists, religion, religious, theists, beliefs, christianity, christian, religions, agnostic | |
| 44 | + | 1 | bible, christians, christian, christianity, church, scripture, religion, jesus, faith, biblical | |
| 45 | + | 2 | god, existence, exist, exists, universe, creation, argument, creator, believe, life | |
| 46 | + | 3 | believe, faith, belief, evidence, blindly, believing, gods, believed, beliefs, convince | |
| 47 | + | 4 | atheism, atheists, agnosticism, belief, arguments, believe, existence, alt, believing, argument | |
| 48 | + |
| 49 | +=== "`'Operating system kernels'`" |
| 50 | + |
| 51 | + | Topic ID | Highest Ranking | |
| 52 | + | - | - | |
| 53 | + | 0 | windows, dos, os, microsoft, ms, apps, pc, nt, file, shareware | |
| 54 | + | 1 | ram, motherboard, card, monitor, memory, cpu, vga, mhz, bios, intel | |
| 55 | + | 2 | unix, os, linux, intel, systems, programming, applications, compiler, software, platform | |
| 56 | + | 3 | disk, scsi, disks, drive, floppy, drives, dos, controller, cd, boot | |
| 57 | + | 4 | software, mac, hardware, ibm, graphics, apple, computer, pc, modem, program | |
| 58 | + |
| 59 | + |
0 commit comments