The advances in artificial intelligence have been astonishing in recent years, with new algorithms showing super-human performance for a wide number of tasks. An important reason for this development is the availability of large datasets and powerful computers, making it possible to train larger machine learning models with higher learning capacity. Artificial neural networks (ANNs) are machine learning models that have been of paramount importance to the development. ANNs are composed of layers of artificial neurons, each of which can only perform a simple computation, but when stacked together in deep architectures, they can be trained to approximate complicated non-linear functions. These models have achieved fantastic results in tasks on various data modalities such as audio, vision, and text. One reason for the success is the internal vector representations computed by the layers, each transforming their input into numerical feature vectors which are increasingly useful for the end task. A complete model is often trained at once (end-to-end learning), and the representations are optimized during training to solve the given task.
This thesis studies the representations computed using artificial neural networks that are trained on and applied to natural language. In paper I and II, we apply learned representations for words to improve the performance of multi-document summarization. In Paper III, we study the use of deep neural sequence models working on the raw character stream as input, and how this class of models can be used to detect medical terms in text (such as drugs, symptoms, and body parts). The system is evaluated on medical health records in Swedish. In paper IV, we propose a novel deep neural sequence model trained to transform words into inflected forms as demonstrated by analogies: “see" is to “sees" as “eat" is to what? The model outperforms previous rule-based approaches by a massive margin, and when inspecting the internal representations computed by this model, one can see that it learns to distinguish classes of transformations of word forms, without being explicitly told to do so. This is an effect from training the model to transform words while provided with the analogous words forms. In other cases, however, the training objective may not provide such cues for the learning algorithm. In Paper V, we study how to improve the way learned representations disentangle the underlying factors of variation in the data. This can be useful for unsupervised representation learning, such as using autoencoders for task agnostic representations or when the final use case is unknown.