Modern wide coverage parsers work as well as they do because of a clever combination of supervised machine learning from small human-annotated corpora of syntactic structures supporting semantic interpretation, with further unsupervised machine-learning trained over vast corpora of unlabeled text, usually using neurally inspired algorithms, sometimes called "Deep Learning". The latter component builds a hidden essentially Markovian sequence representation that acts as a powerful disambiguator, particularly of unseen events, such as words that were not exemplified in the small supervised corpus, and their likely compatibility with syntactic operations. Such parsers can process text and assemble semantic representations from unseen text such as that on the web at speed that are several orders of magnitude faster than human reading. Such machine-reading offers the possibility of fulfilling some of the oldest ambitions of computational natural language processing, such as question answering and building large symbolic knowledge graphs, or semantic networks.
This promise remains largely unfulfilled. The main obstacle to further progress lies in the theories of natural language semantics that such parsers use to assemble meaning representations, which are mostly inherited from linguistics and philosophical logic. Such semantic theories tend to tie meaning representation too closely to specific linguistic forms: for example, they fail to capture the fact that if we are asked "Which author wrote 'Macbeth'?" then the phrase "Shakespeare's 'Macbeth'" in a text may imply the answer. Leaving such commonplace entailments to later open-ended inference via theorem-proving is not practical, and since the early '70s, many have tried and always failed to hand-build a less form-dependent computable semantics.
This paper argues for a different approach, according to which the dimensions of semantic relation are treated as hidden variables, to be discovered by machine learning form data. It will contrast two approaches to the problem. The first, like the parsing models mentioned earlier, is based on word-embeddings, in which the meaning of a word is represented as a dimensionally-reduced vector derived from the string contexts in which it has been seen in large amounts of unlabeled text, and semantic composition is represented as linear algebraic operations. The second uses machine-reading using wide-coverage parsers on large amounts of text to find type-consistent patterns of directional implication in traditional semantic relations across the same n-tuples of arguments---for example that for many pairs of entities X of type "author" and Y of type "literary works", if we read about "X writing Y", we also read elsewhere about "X's Y". The original form-specific semantics in the parser is then replaced by a form-independent semantics, in which paraphrases are collapsed into a single relation label, and directional entailments are represented by conjunction. The latter method has the advantage of being immediately compatible with the logical operators of traditional logical semantics, such as negation and quantification.
Despite the cognitive implausibility of training statistical models over billions of words of text, the paper will argue that parsing models built in this way constitute a computationally practical proxy for the use of common-sense knowledge for syntactic disambiguation for human sentence processing, of the kind demonstrated by psycholinguists. In contrast, it will also argue that the piecemeal assembly of word-meaning by observation of implicative relations in language use, rather than a similarly unstructured representation of string-context, is consistent with observations about the course of child language acquisition.
Last modified: Wednesday, 06-Sep-2017 14:46:56 NZST
This page is maintained by the seminar list administrator.