Skip to content

Commit 2347d50

Browse files
Added citations to C-Top2Vec
1 parent 898185f commit 2347d50

1 file changed

Lines changed: 41 additions & 0 deletions

File tree

docs/c_top2vec.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,47 @@ doc_topic_matrix = model.fit_transform(corpus)
3737
model.print_topics()
3838
```
3939

40+
## Citation
41+
42+
Please cite Angelov and Inkpen (2024) and Turftopic when using C-Top2Vec in publications:
43+
44+
```bibtex
45+
@article{
46+
Kardos2025,
47+
title = {Turftopic: Topic Modelling with Contextual Representations from Sentence Transformers},
48+
doi = {10.21105/joss.08183},
49+
url = {https://doi.org/10.21105/joss.08183},
50+
year = {2025},
51+
publisher = {The Open Journal},
52+
volume = {10},
53+
number = {111},
54+
pages = {8183},
55+
author = {Kardos, Márton and Enevoldsen, Kenneth C. and Kostkan, Jan and Kristensen-McLachlan, Ross Deans and Rocca, Roberta},
56+
journal = {Journal of Open Source Software}
57+
}
58+
59+
@inproceedings{angelov-inkpen-2024-topic,
60+
title = "Topic Modeling: Contextual Token Embeddings Are All You Need",
61+
author = "Angelov, Dimo and
62+
Inkpen, Diana",
63+
editor = "Al-Onaizan, Yaser and
64+
Bansal, Mohit and
65+
Chen, Yun-Nung",
66+
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
67+
month = nov,
68+
year = "2024",
69+
address = "Miami, Florida, USA",
70+
publisher = "Association for Computational Linguistics",
71+
url = "https://aclanthology.org/2024.findings-emnlp.790/",
72+
doi = "10.18653/v1/2024.findings-emnlp.790",
73+
pages = "13528--13539",
74+
abstract = "The goal of topic modeling is to find meaningful topics that capture the information present in a collection of documents. The main challenges of topic modeling are finding the optimal number of topics, labeling the topics, segmenting documents by topic, and evaluating topic model performance. Current neural approaches have tackled some of these problems but none have been able to solve all of them. We introduce a novel topic modeling approach, Contextual-Top2Vec, which uses document contextual token embeddings, it creates hierarchical topics, finds topic spans within documents and labels topics with phrases rather than just words. We propose the use of BERTScore to evaluate topic coherence and to evaluate how informative topics are of the underlying documents. Our model outperforms the current state-of-the-art models on a comprehensive set of topic model evaluation metrics."
75+
}
76+
77+
```
78+
79+
80+
4081
## API Reference
4182

4283
::: turftopic.models.cluster.CTop2Vec

0 commit comments

Comments
 (0)