Skip to content

Commit a8b511b

Browse files
Updated documentation for S3
1 parent c774c20 commit a8b511b

1 file changed

Lines changed: 50 additions & 27 deletions

File tree

docs/s3.md

Lines changed: 50 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -4,59 +4,82 @@ Semantic Signal Separation tries to recover dimensions/axes along which most of
44
A topic in $S^3$ is an axis of semantics in the corpus.
55
This makes the model able to recover more nuanced topical content in documents, but is not optimal when you expect topics to be groupings of documents.
66

7-
$S^3$ is one of the fastest topic models out there, even rivalling vanilla NMF, when not accounting for embedding time.
8-
It also typically produces very high quality topics, and our evaluations indicate that it performs significantly better when no preprocessing is applied to texts.
9-
107

118
<figure>
129
<img src="../images/s3_math_correct.png" width="60%" style="margin-left: auto;margin-right: auto;">
1310
<figcaption> Schematic overview of S³ </figcaption>
1411
</figure>
1512

16-
## How does $S^3$ work?
17-
18-
### Encoding
19-
20-
Documents in $S^3$ get first encoded using an [encoder model](encoders.md).
13+
$S^3$ is one of the fastest topic models out there, even rivalling vanilla NMF, when not accounting for embedding time.
14+
It also typically produces very high quality topics, and our evaluations indicate that it performs significantly better when no preprocessing is applied to texts.
2115

22-
- Let the encodings of documents in the corpus be $X$.
2316

24-
### Decomposition
17+
## How does $S^3$ work?
2518

26-
The next step is to decompose the embedding matrix using ICA, this step discovers the underlying semantics axes as latent independent components in the embeddings.
19+
### Step 1: Document-embedding Decomposition
2720

28-
- Decompose $X$ using FastICA: $X = AS$, where $A$ is the mixing matrix and $S$ is the document-topic-matrix.
21+
The first step is to decompose the embedding matrix using ICA, this step discovers the underlying semantics axes as latent independent components in the embeddings.
2922

30-
### Term Importance Estimation
23+
??? info "See formula"
24+
- Let the encodings of documents in the corpus be $X$.
25+
- Decompose $X$ using FastICA: $X = AS$, where $A$ is the mixing matrix and $S$ is the document-topic-matrix.
3126

27+
### Step 2: Term Importance Estimation
3228

3329
Term importances for each topic are calculated by encoding the entire vocabulary of the corpus using the same embedding model,
3430
then recovering the strength of each latent component in the word embedding matrix.
3531
The strength of the components in the words will be interpreted as the words' importance in a given topic.
3632

37-
- Let the matrix of word encodings be $V$.
38-
- Calculate the pseudo-inverse of the mixing matrix $C = A^{+}$, where $C$ is the _unmixing matrix_.
39-
- Project word embeddings onto the semantic axes by multiplying them with unmixing matrix: $W = VC^T$. $W^T$ is then the topic-term matrix (`model.components_`).
40-
4133
<figure>
4234
<img src="../images/s3_term_importance.png" width="45%" style="margin-left: auto;margin-right: auto;">
4335
<figcaption> Visual representation of term importance approaches in S³ </figcaption>
4436
</figure>
4537

38+
39+
??? info "See formula"
40+
- Let the matrix of word encodings be $V$.
41+
- Calculate the pseudo-inverse of the mixing matrix $C = A^{+}$, where $C$ is the _unmixing matrix_.
42+
- Project word embeddings onto the semantic axes by multiplying them with unmixing matrix: $W = VC^T$. $W^T$ is then the topic-term matrix (`model.components_`).
43+
44+
4645
There are three distinct methods to calculate term importances from word projections:
4746

48-
1. *Axial* word importances (`feature_importance="axial"`) are defined as the words' positions on the semantic axes. The importance of word $j$ for topic $t$ is: $\beta_{tj} = W_{jt}$.
49-
2. *Angular* topics (`feature_importance="angular"`) can be calculated by taking the cosine of the angle between projected word vectors and semantic axes:
50-
$\beta_{tj} = cos(\Theta) = \frac{W_{jt}}{||W_j||}$
51-
3. *Combined* (`feature_importance="combined"`, this is the default) word importance is a combination of the two approaches
52-
$\beta_{tj} = \frac{(W_{jt})^3}{||W_j||}$
47+
!!! quote "Choose a word importance method"
48+
49+
=== "Axial"
50+
51+
```python
52+
from turftopic import SemanticSignalSeparation
53+
54+
model = SemanticSignalSeparation(n_components=10, feature_importance="axial")
55+
```
56+
Axial word importances are defined as the words' positions on the semantic axes.
57+
This approach selects highly relevant words for topic descriptions, but topic descriptions might share words if a word scores high on multiple axes.
58+
59+
The importance of word $j$ for topic $t$ is: $\beta_{tj} = W_{jt}$
60+
61+
=== "Angular"
62+
```python
63+
from turftopic import SemanticSignalSeparation
64+
65+
model = SemanticSignalSeparation(n_components=10, feature_importance="angular")
66+
```
67+
Angular topics can be calculated by taking the cosine of the angle between projected word vectors and semantic axes. This allows the approach axis descriptions to be very distinct and specific to the given axis, but might include words that are not as relevant in the corpus.
68+
69+
$\beta_{tj} = cos(\Theta) = \frac{W_{jt}}{||W_j||}$
70+
71+
=== "Combined (default)"
72+
```python
73+
from turftopic import SemanticSignalSeparation
74+
75+
model = SemanticSignalSeparation(n_components=10, feature_importance="combined")
76+
```
77+
Combined word importance is a combination of axial and andular term importance,
78+
and is recommended as it balances the two approaches' strengths and weaknesses.
5379

54-
Typically, the difference between these is relatively minuscule in terms of performance.
55-
Based on our evaluations, however, we recommend that you use axial or combined topics.
56-
Axial topics tend to result in the most coherent topics, while angular topics result in the most distinct ones.
57-
The combined approach is a reasonable compromise between the two methods, and is thus the default.
80+
$\beta_{tj} = \frac{(W_{jt})^3}{||W_j||}$
5881

59-
### Dynamic Topic Modeling *(Optional)*
82+
## Dynamic Topic Modeling
6083

6184
$S^3$ can also be used as a dynamic topic model.
6285
Temporally changing components are found using the following steps:

0 commit comments

Comments
 (0)