Topic Modeling with Structured Priors for Text-Driven Science

Published on Jul 24, 2015
Michael J. Paul31
Estimated H-index: 31
Sources
Abstract
📖 Papers frequently viewed together
2012NeurIPS: Neural Information Processing Systems
2014WSDM: Web Search and Data Mining
4 Authors (Akira Murakami, ..., Dominik Vajn)
References164
Newest
Jul 1, 2015 in IJCNLP (International Joint Conference on Natural Language Processing)
#1Viet-An Nguyen (UMD: University of Maryland, College Park)H-Index: 12
#2Jordan Boyd-Graber (CU: University of Colorado Boulder)H-Index: 38
Last. Kristina C. Miler (UMD: University of Maryland, College Park)H-Index: 7
view all 4 authors...
We introduce the Hierarchical Ideal Point Topic Model, which provides a rich picture of policy issues, framing, and voting behavior using a joint model of votes, bill text, and the language that legislators use when debating bills. We use this model to look at the relationship between Tea Party Republicans and “establishment” Republicans in the U.S. House of Representatives during the 112th Congress. 1 Capturing Political Polarization Ideal-point models are one of the most widely used tools in c...
Source
Jun 5, 2015 in NAACL (North American Chapter of the Association for Computational Linguistics)
#1Philip Resnik (UMD: University of Maryland, College Park)H-Index: 58
#2William Armstrong (UMD: University of Maryland, College Park)H-Index: 2
Last. Jordan Boyd-Graber (UMD: University of Maryland, College Park)H-Index: 38
view all 6 authors...
Topic models can yield insight into how depressed and non-depressed individuals use language differently. In this paper, we explore the use of supervised topic models in the analysis of linguistic signal for detecting depression, providing promising results using several models.
Source
Feb 2, 2015 in WSDM (Web Search and Data Mining)
#1Michael Röder (Leipzig University)H-Index: 11
#2Andreas BothH-Index: 16
Last. Alexander Hinneburg (MLU: Martin Luther University of Halle-Wittenberg)H-Index: 16
view all 3 authors...
Quantifying the coherence of a set of statements is a long standing problem with many potential applications that has attracted researchers from different sciences. The special case of measuring coherence of topics has been recently studied to remedy the problem that topic models give no guaranty on the interpretablity of their output. Several benchmark datasets were produced that record human judgements of the interpretability of topics. We are the first to propose a framework that allows to co...
Source
#1Michael J. Paul (Johns Hopkins University)H-Index: 31
#2Mark Dredze (Johns Hopkins University)H-Index: 64
We introduce SPRITE, a family of topic models that incorporates structure into model priors as a function of underlying components. The structured priors can be constrained to model topic hierarchies, factorizations, correlations, and supervision, allowing SPRITE to be tailored to particular settings. We demonstrate this flexibility by constructing a SPRITE-based model to jointly infer topic hierarchies and author perspective, which we apply to corpora of political debates and online reviews. We...
Source
Jan 1, 2015 in NAACL (North American Chapter of the Association for Computational Linguistics)
#1Jason Chuang (UW: University of Washington)H-Index: 13
#2Margaret E. Roberts (UCSD: University of California, San Diego)H-Index: 22
Last. Jeffrey Heer (UW: University of Washington)H-Index: 72
view all 7 authors...
Content analysis, a widely-applied social science research method, is increasingly being supplemented by topic modeling. However, while the discourse on content analysis centers heavily on reproducibility, computer scientists often focus more on scalability and less on coding reliability, leading to growing skepticism on the usefulness of topic models for automated content analysis. In response, we introduce TopicCheck, an interactive tool for assessing topic model stability. Our contributions a...
Source
#1Byron C. Wallace (Brown University)H-Index: 39
#2Michael J. Paul (Johns Hopkins University)H-Index: 31
Last. Mark Dredze (Johns Hopkins University)H-Index: 64
view all 5 authors...
Online physician reviews are a massive and potentially rich source of information capturing patient sentiment regarding healthcare. We analyze a corpus comprising nearly 60 000 such reviews with a state-of-the-art probabilistic model of text. We describe a probabilistic generative model that captures latent sentiment across aspects of care (eg, interpersonal manner ). We target specific aspects by leveraging a small set of manually annotated reviews. We perform regression analysis to assess whet...
Source
Collection and especially analysis of open-ended survey responses are relatively rare in the discipline and when conducted are almost exclusively done through human coding. We present an alternative, semiautomated approach, the structural topic model (STM) (Roberts, Stewart, and Airoldi 2013; Roberts et al. 2013), that draws on recent developments in machine learning based analysis of textual data. A crucial contribution of the method is that it incorporates information about the document, such ...
Source
Aug 24, 2014 in KDD (Knowledge Discovery and Data Mining)
#1Aaron Q. Li (CMU: Carnegie Mellon University)H-Index: 2
#2Amr Ahmed (Google)H-Index: 32
Last. Alexander J. Smola (CMU: Carnegie Mellon University)H-Index: 128
view all 4 authors...
Inference in topic models typically involves a sampling step to associate latent variables with observations. Unfortunately the generative model loses sparsity as the amount of data increases, requiring O(k) operations per word for k topics. In this paper we propose an algorithm which scales linearly with the number of actually instantiated topics kd in the document. For large document collections and in structured hierarchical models kd ll k. This yields an order of magnitude speedup. Our metho...
Source
Aug 24, 2014 in KDD (Knowledge Discovery and Data Mining)
#1Yupeng Gu (NU: Northeastern University)H-Index: 6
#2Yizhou Sun (NU: Northeastern University)H-Index: 49
Last. Ting Chen (NU: Northeastern University)H-Index: 22
view all 5 authors...
Ideal point estimation that estimates legislators' ideological positions and understands their voting behavior has attracted studies from political science and computer science. Typically, a legislator is assigned a global ideal point based on her voting or other social behavior. However, it is quite normal that people may have different positions on different policy dimensions. For example, some people may be more liberal on economic issues while more conservative on cultural issues. In this pa...
Source
#1Michael J. Paul (Johns Hopkins University)H-Index: 31
#2Mark Dredze (Johns Hopkins University)H-Index: 64
By aggregating self-reported health statuses across millions of users, we seek to characterize the variety of health information discussed in Twitter. We describe a topic modeling framework for discovering health topics in Twitter, a social media website. This is an exploratory approach with the goal of understanding what health topics are commonly discussed in social media. This paper describes in detail a statistical topic model created for this purpose, the Ailment Topic Aspect Model (ATAM), ...
Source
Cited By1
Newest
This website uses cookies.
We use cookies to improve your online experience. By continuing to use our website we assume you agree to the placement of these cookies.
To learn more, you can find in our Privacy Policy.