Javascript verkar inte påslaget? - Vissa delar av Lunds universitets webbplats fungerar inte optimalt utan javascript, kontrollera din webbläsares inställningar.
Du är här

Workshop with H. Andrew Schwartz: "Automatic Content Analysis using the Differential Language Analysis ToolKit"

Qualitative data, such as essays, social media posts, and free response questions in surveys, are rich sources of psychological, social and behavioural information. Yet such information has traditionally been difficult to leverage at a large scale. Recent advances in computational linguistics and machine learning have produced automatic content analysis tools, which can now be applied to a wide number of settings.  

The Differential Language Analysis ToolKit

DLATK (Differential Language Analysis ToolKit) is an end to end language analysis software, specifically suited for social media and social scientific research applications. It has been used for research published in over 50 peer-reviewed papers across psychology, computer science, public health, medicine, and political science. Although the heart of DLATK is a Python library, it is typically used through a command interface (requiring no programming). This tutorial will cover the fundamentals of automated content analysis using DLATK:

  1. Differential language analysis
  2. (Producing linguistic insights into psychosocial phenomena)
  3. Predictive analytics
  4. (Machine and statistical learning using text data)
  5. The ingredients of automatic content analysis

About the instructor
H. Andrew Schwartz is part of the faculty of the Computer Science Department and Institute for AI-Driven Discovery and Innovation at Stony Brook University, New York. He was previously Lead Research Scientist for the interdisciplinary “World Well-Being Project” at the University of Pennsylvania where he created the Differential Language Analysis ToolKit.


To register please following the link:


To participate in the tutorial you will need a laptop with which you can connect to the Internet (Windows PC, Mac, or Linux PC -- all ok). The training venue provides a free Wifi connection for the day, and power to keep your laptop charged.

During the tutorial, participants will connect to a computer server which already has the analysis software (DLATK) installed. After you register, you will receive instructions on how to access this server and test a basic command which needs to be completed 24 hours before the tutorial.

Desirable but non-essential expertise:

  • an entry-level understanding (or higher) of quantitative research methods
  • basic scripting (R, Python, or syntax/code in SPSS/SAS/STATA)


Recommended pre-reading

Differential Language Analysis ToolKit:



Kern, M. L., Park, G., Eichstaedt, J. C., Schwartz, H. A., Sap, M., Smith, L. K., & Ungar, L. H. (2016). Gaining insights from social media language: Methodologies and challenges.

Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political analysis, 21(3), 267-297.

Schwartz, H. A., Giorgi, S., Sap, M., Crutchley, P., Ungar, L., & Eichstaedt, J. (2017). DLATK: Differential Language Analysis ToolKit. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing: System Demonstrations (pp. 55-60). Pdf

Schwartz, H. A., & Ungar, L. H. (2015). Data-driven content analysis of social media: a systematic overview of automated methods. The ANNALS of the American Academy of Political and Social Science, 659(1), 78-94.

2019-06-13 09:00 till 14:00
oscar.kjell [at]

Om händelsen

2019-06-13 09:00 till 14:00
oscar.kjell [at]

Institutionen för psykologi
Box 213, 221 00  LUND
Telefon: 046-222 00 00
webb [at] psy [dot] lu [dot] se