Skip to content

Commit daea236

Browse files
Updated docs
1 parent ac496c9 commit daea236

3 files changed

Lines changed: 15 additions & 20 deletions

File tree

docs/KeyNMF.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,12 @@ Topics in this matrix are then discovered using Non-negative Matrix Factorizatio
3030
Essentially the model tries to discover underlying dimensions/factors along which most of the variance in term importance
3131
can be explained.
3232

33+
### _(Optional)_ 3. Dynamic Modeling
34+
35+
KeyNMF is also capable of modeling topics over time.
36+
This happens by fitting a KeyNMF model first on the entire corpus, then
37+
fitting individual topic-term matrices using coordinate descent based on the document-topic and document-term matrices in the given time slices.
38+
3339
## Considerations
3440

3541
### Strengths

docs/dynamic.md

Lines changed: 8 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -4,41 +4,30 @@ If you want to examine the evolution of topics over time, you will need a dynami
44

55
> Note that regular static models can also be used to study the evolution of topics and information dynamics, but they can't capture changes in the topics themselves.
66
7-
## Theory
7+
## Models
88

9-
A number of different conceptualizations can be used to study evolving topics in corpora, for instance:
10-
11-
1. One can imagine topic representations to be governed by a Brownian Markov Process (random walk), in such a case the evolution is part of the model itself.
12-
In layman's terms you describe the evolution of topics directly in your generative model by expecting the topic representations to be sampled from Gaussian noise around the last time step.
13-
Sometimes researchers will also refer to such models as _state-space_ approaches.
14-
This is the approach that the original [DTM paper](https://mimno.infosci.cornell.edu/info6150/readings/dynamic_topic_models.pdf) utilizes.
15-
Along with [this paper](https://arxiv.org/pdf/1709.00025.pdf) on Dynamic NMF.
16-
2. You can fit one underlying statistical model over the entire corpus, and then do post-hoc term importance estimation per time slice.
17-
This is [what BERTopic does](https://maartengr.github.io/BERTopic/getting_started/topicsovertime/topicsovertime.html).
18-
3. You can fit one model per time slice, and then use some aggregation procedure to merge the models.
19-
This approach is used in the Dynamic NMF in [this paper](https://www.cambridge.org/core/journals/political-analysis/article/exploring-the-political-agenda-of-the-european-parliament-using-a-dynamic-topic-modeling-approach/BBC7751778E4542C7C6C69E6BF954E4B).
20-
21-
Developing such approaches takes a lot of time and effort, and we have plans to add dynamic modeling capabilities to all models in Turftopic.
22-
For now only models of the second kind are on our list of things to do, and dynamic topic modeling has been implemented for GMM, and will soon be implemented for Clustering Topic Models.
23-
For more theoretical background, see the page on [GMM](GMM.md).
9+
In Turftopic you can currently use three different topic models for modeling topics over time:
10+
1. [ClusteringTopicModel](clustering.md), where an overall model is fitted on the whole corpus, and then term importances are estimated over time slices.
11+
2. [GMM](GMM.md), similarly to clustering models, term importances are reestimated per time slice
12+
3. [KeyNMF](KeyNMF.md), an overall decomposition is done, then using coordinate descent, topic-term-matrices are recalculated based on document-topic importances in the given time slice.
2413

2514
## Usage
2615

2716
Dynamic topic models in Turftopic have a unified interface.
2817
To fit a dynamic topic model you will need a corpus, that has been annotated with timestamps.
2918
The timestamps need to be Python `datetime` objects, but pandas `Timestamp` object are also supported.
3019

31-
Models that have dynamic modeling capabilities (currently, `GMM` and `ClusteringTopicModel`) have a `fit_transform_dynamic()` method, that fits the model on the corpus over time.
20+
Models that have dynamic modeling capabilities (`KeyNMF`, `GMM` and `ClusteringTopicModel`) have a `fit_transform_dynamic()` method, that fits the model on the corpus over time.
3221

3322
```python
3423
from datetime import datetime
3524

36-
from turftopic import GMM
25+
from turftopic import KeyNMF
3726

3827
corpus: list[str] = [...]
3928
timestamps: list[datetime] = [...]
4029

41-
model = GMM(5)
30+
model = KeyNMF(5)
4231
document_topic_matrix = model.fit_transform_dynamic(corpus, timestamps=timestamps)
4332
```
4433

docs/model_overview.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ Here is an opinionated guide for common use cases:
9090
### 1. When in doubt **use KeyNMF**.
9191

9292
When you can't make an informed decision about which model is optimal for your use case, or you just want to get your hands dirty with topic modeling,
93-
KeyNMF is the best option.
93+
KeyNMF is by far the best option.
9494
It is very stable, gives high quality topics, and is incredibly robust to noise.
9595
It is also the closest to classical topic models and thus conforms to your intuition about topic modeling.
9696

0 commit comments

Comments
 (0)