What are the Goals of Distributional Semantics?

Paper · arXiv 2005.02982 · Published May 6, 2020
LLM ArchitectureNLP and LinguisticsPhilosophy and Subjectivity

Distributional semantic models have become a mainstay in NLP, providing useful features for downstream tasks. However, assessing long-term progress requires explicit long-term goals. In this paper, I take a broad linguistic perspective, looking at how well current models can deal with various semantic challenges. Given stark differences between models proposed in different subfields, a broad perspective is needed to see how we could integrate them. I conclude that, while linguistic insights can guide the design of model architectures, future progress will require balancing the often conflicting demands of linguistic expressiveness and computational tractability.

Introduction. In order to assess progress in any field, the goals need to be clear. In assessing progress in semantics, Koller (2016) contrasts “top-down” and “bottomup” approaches: a top-down approach begins with an overarching goal, and tries to build a model to reach it; a bottom-up approach begins with existing models, and tries to extend them towards new goals.1 Like much of NLP, distributional semantics is largely bottom-up: the goals are usually to improve performance on particular tasks, or particular datasets. Aiming to improve NLP applications is of course a legitimate decision, but Koller points out a problem if there is no top-down goal: “Bottom-up theories are intrinsically unfalsifiable... We won’t know where distributional semantics is going until it has a top-down element”. This is contrasted against truth-conditional semantics, a traditional linguistic approach which is largely topdown: “truth-conditional semantics hasn’t reached its goal, but at least we knew what the goal was”.

Discussion / Conclusion. A common thread among all of the above sections is that reaching our semantic goals requires structure beyond representing meaning as a point in space. In particular, it seems desirable to represent the meaning of a word as a region of space or as a classifier, and to work with probability logic. However, there is a trade-off between expressiveness and learnability: the more structure we add, the more difficult it can be to work with our representations. To this end, there are promising neural architectures for working with structured data, such dependency graphs (for example: Marcheggiani and Titov, 2017) or logical propositions (for example: Rockt ̈aschel and Riedel, 2017; Minervini et al., 2018). To mitigate computationally expensive calculations in probabilistic models, there are promising new techniques such as amortised variational inference, used in the Variational Autoencoder (Kingma and Welling, 2014; Rezende et al., 2014; Titsias and L ́azaro-Gredilla, 2014). My own recent work in this direction has been to develop the Pixie Autoencoder (Emerson, 2020a), and I look forward to seeing alternative approaches from other authors, as the field of distributional semantics continues to grow.