You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`prepare_topic_data()` not only fits the model (only if not already fitted), but also saves other aspects of topic inference, which makes it easier to then use this object for pretty printing and visualizing your models (see [Model Interpretation](model_interpretation.md))
217
+
218
+
```python
219
+
topic_data = model.prepare_topic_data(corpus)
220
+
# print to see what attributes you can access.
221
+
print(topic_data)
222
+
```
223
+
```
224
+
TopicData
225
+
├── corpus (1000)
226
+
├── vocab (1746,)
227
+
├── document_term_matrix (1000, 1746)
228
+
├── topic_term_matrix (10, 1746)
229
+
├── document_topic_matrix (1000, 10)
230
+
├── document_representation (1000, 384)
231
+
├── transform
232
+
├── topic_names (10)
233
+
├── has_negative_side
234
+
└── hierarchy
235
+
```
236
+
See [Using TopicData](topic_data.md) for more detail.
237
+
195
238
196
239
### Precomputing Embeddings
197
240
198
241
In order to cut down on costs/computational load when fitting multiple models in a row, you might want to encode the documents before fitting a model.
199
242
Encoding the corpus is the heaviest part of the process and you can spare yourself a lot of time by only doing it once.
200
-
201
243
Some models have to encode the vocabulary as well, this cannot be done before inference, as the models learn the vocabulary itself from the corpus.
202
244
203
245
The `fit()` method of all models takes and `embeddings` argument, that allows you to pass a precooked embedding matrix along to fitting.
The easiest way you can investigate topics in your fitted model is to use the built-in pretty printing utilities, that you can call on every fitted model or `TopicData` object.
14
+
The easiest way you can investigate topics in your fitted model is to use the built-in pretty printing utilities, that you can call on every fitted model or [`TopicData`](topic_data.md) object.
15
15
16
16
!!! quote "Interpret your models with topic tables"
17
17
=== "Relevant Words"
@@ -127,7 +127,7 @@ pip install topic-wizard
127
127
### Web App
128
128
129
129
The easiest way to investigate any topic model interactively is to use the topicwizard web app.
130
-
You can launch the app either using a `TopicData` or a model object and a representative sample of documents.
130
+
You can launch the app either using a [`TopicData`](topic_data.md) or a model object and a representative sample of documents.
131
131
132
132
=== "With `TopicData`"
133
133
@@ -148,7 +148,7 @@ You can launch the app either using a `TopicData` or a model object and a repres
148
148
### Figures
149
149
150
150
You can also produce individual interactive figures using the [Figures API in topicwizard](https://x-tabdeveloping.github.io/topicwizard/figures.html).
151
-
Almost all figures in the Figures API can be called on the `figures` submodule of any `TopicData` object.
151
+
Almost all figures in the Figures API can be called on the `figures` submodule of any [`TopicData`](topic_data.md) object.
152
152
153
153
!!! quote "Interpret your models using interactive figures"
While Turftopic provides a fully sklearn-compatible interface for training and using topic models, this is not always optimal, especially when you have to visualize models, or save more information about inference then would be practical to have in a `model` object.
4
+
We have thus added an abstraction borrowed from [topicwizard](https://github.com/x-tabdeveloping/topicwizard) called `TopicData`.
5
+
6
+
## Producing `TopicData`
7
+
Every model has methods, with which you can produce this object:
`TopicData` is a dict-like object, and for all intents and purposes can be used as a Python dictionary, but for convenience you can also access its attributes with the dot syntax:
Much like models, you can pretty-print information about topic models based on the `TopicData` object, but, since it contains more information on inference then the model object itself, you sometimes have to pass less parameters than if you called the same method on the model:
0 commit comments