Skip to content

Commit 030a08f

Browse files
Updated paper with suggestions
1 parent f6b8f4c commit 030a08f

2 files changed

Lines changed: 19 additions & 18 deletions

File tree

paper.md

Lines changed: 19 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ authors:
2424
affiliations:
2525
- name: Center for Humanities Computing, Aarhus University, Denmark
2626
index: 1
27-
- name: Interactive Minds Center, Aarhus University, Denmark
27+
- name: Interacting Minds Center, Aarhus University, Denmark
2828
index: 2
2929
- name: Department of Linguistics, Cognitive Science, and Semiotics, Aarhus University, Denmark
3030
index: 3
@@ -34,10 +34,10 @@ bibliography: paper.bib
3434

3535
# Summary
3636

37-
Turftopic is a topic modelling library including a number of recent topic models that go beyond bag-of-words and can understand text in context, utilizing representations from transformers.
38-
Turftopic focuses on ease-of-use, providing a unified, interface for a number of different modern topic models, and boasting both model-specific and model-agnostic interpretation and visualization utilities.
39-
The user is afforded great flexibility in model choice and customization, but the library comes with reasonable defaults, not to overwhelm first-time users with a plethora of choices.
40-
In addition, Turftopic allows you to model topics, as they change over time, learning themes from streams of texts, finding hierarchical topics, and multilingual usage.
37+
Turftopic is a topic modelling library including a number of recent topic models that go beyond bag-of-words models and can understand text in context, utilizing representations from transformers.
38+
Turftopic focuses on ease of use, providing a unified interface for a number of different modern topic models, and boasting both model-specific and model-agnostic interpretation and visualization utilities.
39+
While the user is afforded great flexibility in model choice and customization, the library comes with reasonable defaults, so as not to needlessly overwhelm first-time users.
40+
In addition, Turftopic allows the user to: a) model topics as they change over time, b) learn topics on-line from a stream of texts, c) find hierarchical structure in topics, d) learning topics in multilingual texts and corpora.
4141
Users can utilize the power of large language models (LLMs) to give human-readable names to topics.
4242
Turftopic also comes with built-in utilities for generating topic descriptions based on key-phrases or lemmas rather than individual words.
4343

@@ -47,38 +47,39 @@ Turftopic also comes with built-in utilities for generating topic descriptions b
4747

4848
While a number of software packages have been developed for contextual topic modelling in recent years, including BERTopic [@bertopic_paper], Top2Vec [@top2vec], CTM [@ctm], these packages include implementations of one or two topic models, and most of the utilities they provide are model-specific. This has resulted in the unfortunate situation that practitioners need to switch between different libraries and adapt to their particularities in both interface and functionality.
4949
Some attempts have been made at creating unified packages for modern topic models, including STREAM [@stream] and TopMost [@topmost].
50-
These packages, however have a focus on neural models and topic model evaluation, have abstract and highly specialized interfaces, and do not include some popular topic models.
51-
Additionally, while model interpretation is an incredibly important aspect of topic modelling, the interpretation utilities provided in these libraries are fairly limited, especially in comparison with model-specific packages, like BERTopic.
50+
These packages, however, have a focus on neural models and topic model evaluation, have abstract and highly specialized interfaces, and do not include some popular topic models.
51+
Additionally, while model interpretation is fundamental aspect of topic modelling, the interpretation utilities provided in these libraries are fairly limited, especially in comparison with model-specific packages, like BERTopic.
5252

5353
Turftopic unifies state-of-the-art contextual topic models under a superset of the `scikit-learn` [@scikit-learn] API, which users are likely already familiar with, and can be readily included in `scikit-learn` workflows and pipelines.
54-
We focused on making Turftopic first and foremost an easy-to-use library, that does not necessitate expert knowledge or excessive amounts of code to get started with, but gives great flexibility to power users.
55-
Furthermore, included an extensive suite of pretty-printing and visualization utilities that aid users in interpreting their results.
56-
The library also includes three topic models, which to our knowledge only have implementations in Turftopic, these are: KeyNMF [@keynmf], S^3^ [@s3], and GMM.
54+
We focused on making Turftopic first and foremost an easy-to-use library that does not necessitate expert knowledge or excessive amounts of code to get started with, but gives great flexibility to power users.
55+
Furthermore, we included an extensive suite of pretty-printing and visualization utilities that aid users in interpreting their results.
56+
The library also includes three topic models, which to our knowledge only have implementations in Turftopic, these are: KeyNMF [@keynmf], S^3^ [@s3], and GMM, a Gaussian Mixture model of document representations with a soft-c-tf-idf term weighting scheme.
5757

5858
# Functionality
5959

6060
Turftopic includes a wide array of contextual topic models from the literature, these include:
6161
FASTopic [@fastopic], Clustering models, such as BERTopic [@bertopic_paper] and Top2Vec [@top2vec], auto-encoding topic models, like CombinedTM [@ctm] and ZeroShotTM [@zeroshot_tm], KeyNMF [@keynmf], Semantic Signal Separation [@s3] and GMM.
62-
We believe these models to be representative of the state of the art in contextual topic modelling and intend to expand on them in the future.
62+
At the time of writing, these models are representative of the state of the art in contextual topic modelling and intend to expand on them in the future.
6363

6464
![Components of a Topic Modelling Pipeline in Turftopic](https://x-tabdeveloping.github.io/turftopic/images/topic_modeling_pipeline.png){width="800px"}
6565

66-
Each model in Turftopic has an *encoder* component, which is used for producing continuous document-representations, and a *vectorizer* component, which extracts term counts in each documents, thereby dictating which terms will be considered in topics.
66+
Each model in Turftopic has an *encoder* component, which is used for producing continuous document-representations [@sentence_transformers], and a *vectorizer* component, which extracts term counts in each documents, thereby dictating which terms will be considered in topics.
6767
The user has full control over what components should be used at different stages of the topic modelling process, thereby having fine-grained influence on the nature and quality of topics.
6868

69-
The library comes loaded with a lot of utilities to help users interpret their results, including *pretty printing* utilities for exploring topics, *interactive visualizations* partially powered by the `topicwizard` [@topicwizard] Python package, and *automated topic naming* with LLMs.
69+
The library comes loaded with numerous utilities to help users interpret their results, including *pretty printing* utilities for exploring topics, *interactive visualizations* partially powered by the `topicwizard` [@topicwizard] Python package, and *automated topic naming* with LLMs.
7070

71-
To accommodate a variety of use-cases, Turftopic can be used for dynamic topic modelling, where we expect topics to change over time, can be used for uncovering hierarchical structure in topics.
72-
Some models can also be fitted in an *online* fashion, where documents are accounted for as they come in by batches.
71+
To accommodate a variety of use cases, Turftopic can be used for *dynamic* topic modelling, where we expect topics to change over time.
72+
Turftopic is also capable of extracting topics at multiple levels of granularity, thereby uncovering *hierarchical* topic structures.
73+
Some models can also be fitted in an *online* fashion, where documents are accounted for as they come in batches.
7374
Turftopic also includes *seeded* topic modelling, where a seed phrase can be used to retrieve topics relevant to the specific research question.
7475

7576
# Use Cases
7677

77-
Topic models can be utilized in a number of research settings, including exploratory data analysis, discourse analysis of diverse domains, such as newspapers, social media or policy documents.
78-
Turftopic has already been utilized by @keynmf for analyzing information dynamics in Chinese Diaspora Media, and is currently being used in multiple ongoing research projects, including one analyzing discourse on the HPV vaccine in Denmark, and studying Danish golden-age literature.
78+
Topic modelling is a key tool for quantitative text analysis [@quantitative_text_analysis], and can be utilized in a number of research settings, including exploratory data analysis, discourse analysis of diverse domains, such as newspapers, social media or policy documents.
79+
Turftopic has already been utilized by @keynmf for analyzing information dynamics in Chinese diaspora media, and is currently being used in multiple ongoing research projects, including one analyzing discourse on the HPV vaccine in Denmark, and studying Danish golden-age literature.
7980

8081
# Target Audience
8182

8283
We expect that Turftopic will prove useful to a diverse user base including computational researchers in digital humanities and social sciences, and industry NLP professionals.
83-
Due to ease of use, Turftopic is also an appropriate choice for educational purposes.
84+
Turftopic is also an appropriate choice for educational purposes, providing instructors with a single, user-friendly framework for students to explore and compare alternative topic modelling approaches.
8485

paper.pdf

2.15 KB
Binary file not shown.

0 commit comments

Comments
 (0)