Using Topic Models to Identify Clients’ Functioning Levels and Alliance Ruptures in Psychotherapy
Computerized Natural Language Processing techniques can analyze psychotherapy sessions as texts; thus generating information about the therapy process and outcome and supporting the scaling-up of psychotherapy research. We used topic modeling to identify topics discussed in psychotherapy sessions and explored (1) which topics best identified clients’ functioning and alliance ruptures and (2) whether changes in these topics were associated with changes in outcome. Transcripts of 873 sessions from 58 clients treated by 52 therapists were analyzed. Prior to each session, clients self-reported functioning and symptom level. After each session, therapists reported the extent of alliance rupture. Latent Dirichlet Allocation was used to extract latent topics from psychotherapy textual data. Then a Sparse Multinomial Logistic Regression model was used to predict which topics best identified clients’ functioning levels and the occurrence of alliance ruptures in psychotherapy sessions. Finally, we used multi-level growth models to explore the associations between changes in topics and changes in outcome. Session- based processing yielded a list of semantic topics. The model identified the labels above chance (65%-75% accuracy).
Introduction. Psychotherapy is based to a great extent on the content of exchanges between clients and therapists, which conveys important information about the participants’ modes of communication, mental states, thoughts, and feelings. Until recently, most psychotherapy research has relied on self-report measures or on human coders to quantify the information in psychotherapy sessions. These standardized subjective measures are the building blocks of psychotherapy research, and the process and outcome of treatment cannot be studied without them. However, these methods also have critical shortcomings, including the extent of participants’ self-insights, their willingness to complete questionnaires, and their restricted choice of responses (for a review of the limitations of current research methods, see Kazdin, 2016). Furthermore, observational human coding is very labor-intensive, which limits the amount of data that can be analyzed and thus curtails the generalizability of results (Hill & Lambert, 2004).
Discussion / Conclusion. Advanced machine learning techniques are relatively novel in psychotherapy research, but emerging evidence suggests the value of integrating them into traditional measures commonly applied to therapy (Dwyer, Falkai, & Koutsouleris, 2018). We used topic modeling, a data-driven machine learning technique that extracts latent topics from textual data to examine which topics best identify clients’ functioning and alliance ruptures in psychotherapy sessions, and whether changes in these topics were associated with changes in treatment outcome. Topic modeling yielded semantically meaningful topics that were then used to identify session level clients’ functioning and rupture. Consistent with our first hypothesis, the SMLR models with topic models features identified labels above chance, at 65% (alliance ruptures) to 75% (clients’ functioning) test accuracy.