Updated documentation for S3

x-tabdeveloping · x-tabdeveloping · commit a8b511bb54db · 2025-02-17T13:42:54.000+01:00
diff --git a/docs/s3.md b/docs/s3.md
@@ -4,59 +4,82 @@ Semantic Signal Separation tries to recover dimensions/axes along which most of
 A topic in $S^3$ is an axis of semantics in the corpus.
 This makes the model able to recover more nuanced topical content in documents, but is not optimal when you expect topics to be groupings of documents.
 
-$S^3$ is one of the fastest topic models out there, even rivalling vanilla NMF, when not accounting for embedding time.
-It also typically produces very high quality topics, and our evaluations indicate that it performs significantly better when no preprocessing is applied to texts.
-
 
 <figure>
   <img src="../images/s3_math_correct.png" width="60%" style="margin-left: auto;margin-right: auto;">
   <figcaption> Schematic overview of S³  </figcaption>
 </figure>
 
-## How does $S^3$ work?
-
-### Encoding
-
-Documents in $S^3$ get first encoded using an [encoder model](encoders.md).
+$S^3$ is one of the fastest topic models out there, even rivalling vanilla NMF, when not accounting for embedding time.
+It also typically produces very high quality topics, and our evaluations indicate that it performs significantly better when no preprocessing is applied to texts.
 
-- Let the encodings of documents in the corpus be $X$.
 
-### Decomposition
+## How does $S^3$ work?
 
-The next step is to decompose the embedding matrix using ICA, this step discovers the underlying semantics axes as latent independent components in the embeddings.
+### Step 1: Document-embedding Decomposition
 
-- Decompose $X$ using FastICA: $X = AS$, where $A$ is the mixing matrix and $S$ is the document-topic-matrix.
+The first step is to decompose the embedding matrix using ICA, this step discovers the underlying semantics axes as latent independent components in the embeddings.
 
-### Term Importance Estimation
+??? info "See formula"
+    - Let the encodings of documents in the corpus be $X$.
+    - Decompose $X$ using FastICA: $X = AS$, where $A$ is the mixing matrix and $S$ is the document-topic-matrix.
 
+### Step 2: Term Importance Estimation
 
 Term importances for each topic are calculated by encoding the entire vocabulary of the corpus using the same embedding model,
 then recovering the strength of each latent component in the word embedding matrix.
 The strength of the components in the words will be interpreted as the words' importance in a given topic.
 
-- Let the matrix of word encodings be $V$.
-- Calculate the pseudo-inverse of the mixing matrix $C = A^{+}$, where $C$ is the _unmixing matrix_.
-- Project word embeddings onto the semantic axes by multiplying them with unmixing matrix: $W = VC^T$. $W^T$ is then the topic-term matrix (`model.components_`).
-
 <figure>
   <img src="../images/s3_term_importance.png" width="45%" style="margin-left: auto;margin-right: auto;">
   <figcaption> Visual representation of term importance approaches in S³  </figcaption>
 </figure>
 
+
+??? info "See formula"
+    - Let the matrix of word encodings be $V$.
+    - Calculate the pseudo-inverse of the mixing matrix $C = A^{+}$, where $C$ is the _unmixing matrix_.
+    - Project word embeddings onto the semantic axes by multiplying them with unmixing matrix: $W = VC^T$. $W^T$ is then the topic-term matrix (`model.components_`).
+
+
 There are three distinct methods to calculate term importances from word projections:
 
-1. *Axial* word importances (`feature_importance="axial"`) are defined as the words' positions on the semantic axes. The importance of word $j$ for topic $t$ is: $\beta_{tj} = W_{jt}$.
-2. *Angular* topics (`feature_importance="angular"`) can be calculated by taking the cosine of the angle between projected word vectors and semantic axes:
-$\beta_{tj} = cos(\Theta) = \frac{W_{jt}}{||W_j||}$
-3. *Combined* (`feature_importance="combined"`, this is the default) word importance is a combination of the two approaches
-$\beta_{tj} = \frac{(W_{jt})^3}{||W_j||}$
+!!! quote "Choose a word importance method"
+
+    === "Axial"
+
+        ```python
+        from turftopic import SemanticSignalSeparation
+
+        model = SemanticSignalSeparation(n_components=10, feature_importance="axial")
+        ```
+        Axial word importances are defined as the words' positions on the semantic axes.
+        This approach selects highly relevant words for topic descriptions, but topic descriptions might share words if a word scores high on multiple axes.
+
+        The importance of word $j$ for topic $t$ is: $\beta_{tj} = W_{jt}$
+
+    === "Angular"
+        ```python
+        from turftopic import SemanticSignalSeparation
+
+        model = SemanticSignalSeparation(n_components=10, feature_importance="angular")
+        ```
+        Angular topics can be calculated by taking the cosine of the angle between projected word vectors and semantic axes. This allows the approach axis descriptions to be very distinct and specific to the given axis, but might include words that are not as relevant in the corpus.
+
+        $\beta_{tj} = cos(\Theta) = \frac{W_{jt}}{||W_j||}$
+
+    === "Combined (default)"
+        ```python
+        from turftopic import SemanticSignalSeparation
+
+        model = SemanticSignalSeparation(n_components=10, feature_importance="combined")
+        ```
+        Combined  word importance is a combination of axial and andular term importance,
+        and is recommended as it balances the two approaches' strengths and weaknesses.
 
-Typically, the difference between these is relatively minuscule in terms of performance.
-Based on our evaluations, however, we recommend that you use axial or combined topics.
-Axial topics tend to result in the most coherent topics, while angular topics result in the most distinct ones.
-The combined approach is a reasonable compromise between the two methods, and is thus the default.
+        $\beta_{tj} = \frac{(W_{jt})^3}{||W_j||}$
 
-### Dynamic Topic Modeling *(Optional)*
+## Dynamic Topic Modeling
 
 $S^3$ can also be used as a dynamic topic model.
 Temporally changing components are found using the following steps: