Topic Modelling with BERTtopic in Python | by Petr Korab | Apr, 2024

April 1, 2024
by Petr Korab
AI, political statements, state-of-the-art, Syndicated, topic modeling, transformer-based
586 Views

Hands-on tutorial on modeling political statements with a state-of-the-art transformer-based topic model

Topic modeling (i.e., topic identification in a corpus of text data) has developed quickly since the Latent Dirichlet Allocation (LDA) model was published. This classic topic model, however, does not well capture the relationships between words because it is based on the statistical concept of a bag of words. Recent embedding-based Top2Vec and BERTopic models address its drawbacks by exploiting pre-trained language models to generate topics.

In this article, we’ll use Maarten Grootendorst’s (2022) BERTopic to identify the terms representing topics in political speech transcripts. It outperforms most traditional and modern topic models in topic modeling metrics on various corpora and has been used in companies, academia (Chagnon, 2024), and the public sector. We’ll explore in Python code:

how to effectively preprocess data
how to create a Bigram topic model
how to explore the most frequent terms over time.

As an example dataset, we’ll use the Empoliticon: Political Speeches-Context & Emotion dataset, released under the…

Source link

Data language models models political statements public python code relationships state-of-the-art topic modeling transformer-based

Topic Modelling with BERTtopic in Python | by Petr Korab | Apr, 2024

Hands-on tutorial on modeling political statements with a state-of-the-art transformer-based topic model

About Us

Our Services

Latest QSOL IT News

Topic Modelling with BERTtopic in Python | by Petr Korab | Apr, 2024

Hands-on tutorial on modeling political statements with a state-of-the-art transformer-based topic model

Related Post

Pecos-Barstow-Toyah ISD

NASA’s repository of geospatial data contains key insights

We’re committed to providing AI models that address

Episode 3: Building AI Security & Trust –