Skip to content

Commit e5c9858

Browse files
Added CVP to docs
1 parent e19f174 commit e5c9858

2 files changed

Lines changed: 79 additions & 0 deletions

File tree

docs/cvp.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
# Concept Vector Projection
2+
3+
Concept Vector Projection is an embedding-based method for extracting continuous sentiment (or other) scores from free-text documents.
4+
5+
<figure>
6+
<img src="../images/cvp.png", title="", style="width:1050px;padding:0px;border:none;"></img>
7+
<figcaption> Figure 1: Schematic Overview of Concept Vector Projection.<br> <i>Figure from Lyngbæk et al. (2025)</i> </figcaption>
8+
</figure>
9+
10+
The method rests on the idea that one can construct a _concept vector_ by encoding positive and negative _seed phrases_ with a transformer, then taking the difference of these mean vectors.
11+
We can then project other documents' embeddings onto these concept vectors by taking the dot product with the concept vector, thereby giving continuous scores on how related documents are to a given concept.
12+
13+
## Usage
14+
15+
### Single Concept
16+
17+
When projecting onto a single concept, you should specify the seeds as a tuple of positive and negative phrases.
18+
19+
```python
20+
from turftopic import ConceptVectorProjection
21+
22+
positive = [
23+
"I love this product",
24+
"This is absolutely lovely",
25+
"My daughter is going to adore this"
26+
]
27+
negative = [
28+
"This product is not at all as advertised, I'm very displeased",
29+
"I hate this",
30+
"What a horrible way to deal with people"
31+
]
32+
cvp = ConceptVectorProjection(seeds=(positive, negative))
33+
34+
test_documents = ["My cute little doggy", "Few this is digusting"]
35+
doc_concept_matrix = cvp.transform(test_documents)
36+
print(doc_concept_matrix)
37+
```
38+
39+
```python
40+
[[0.24265897]
41+
[0.01709663]]
42+
```
43+
44+
### Multiple Concepts
45+
46+
When projecting documents to multiple concepts at once, you will need to specify seeds for each concept, as well as its name.
47+
Internally this is handled with an `OrderedDict`, which you can either specify yourself, or Turftopic can do it for you:
48+
49+
```python
50+
import pandas as pd
51+
from collections import OrderedDict
52+
53+
cuteness_seeds = (["Absolutely adorable", "I love how he dances with his little feet"], ["What a big slob of an abomination", "A suspicious old man sat next to me on the bus today"])
54+
bullish_seeds = (["We are going to the moon", "This stock will prove an incredible investment"], ["I will short the hell out of them", "Uber stocks drop 7% in value after down-time."])
55+
56+
# Either specify it like this:
57+
seeds = [("cuteness", cuteness_seeds), ("bullish", bullish_seeds)]
58+
# or as an OrderedDict:
59+
seeds = OrderedDict([("cuteness", cuteness_seeds), ("bullish", bullish_seeds)])
60+
cvp = ConceptVectorProjection(seeds=seeds)
61+
62+
test_documents = ["What an awesome investment", "Tiny beautiful kitty-cat"]
63+
doc_concept_matrix = cvp.transform(test_documents)
64+
concept_df = pd.DataFrame(doc_concept_matrix, columns=cvp.get_feature_names_out())
65+
print(concept_df)
66+
```
67+
68+
```python
69+
cuteness bullish
70+
0 0.085957 0.288779
71+
1 0.269454 0.009495
72+
```
73+
74+
## API Reference
75+
76+
77+
::: turftopic.models.cvp.ConceptVectorProjection
78+
79+

docs/images/cvp.png

93.8 KB
Loading

0 commit comments

Comments
 (0)