Rating: 7.7/10.
Overall an okay but not superb book. The parts about pragmatics were the least familiar to me, but the writing was poor as a lot of advanced concepts were introduced too quickly for me.
Ch2: What is Meaning?
One way to represent meaning is by assigning logical forms to sentences. Modal logic adds the ability to describe degrees of certainty like “probably”. Model theory describes when formal systems are valid, so every well-formed expression has a meaning. They’re compositional, so the meaning of a composite expression can be derived from the meanings of the sub-expressions.
Many tasks in NLP depend on commonsense reasoning, for example, recognizing textual entailment, coreference resolution. However, the interpretation from common sense is defeasible: if the sentences were in a different context, the answer might change. Finding the answer often requires inference, when the exact words don’t match but there is some common sense connection.
Expressions often have meanings in context beyond their surface-level meanings. The word “yes” in a discourse makes something a public commitment (a statement that you state you believe in), this is the only way to maintain discourse coherence. It’s also an anaphoric expression, and if the referent is ambiguous, then the commitment is also ambiguous.
Meaning of speech acts can be divided into three levels. A locutionary act is the content that one says, and the meaning is the linguistic form, and is handled by compositional semantics. An illocutionary act is by saying X, you assert that X is true, so the meaning is the public commitment that you are making. A perlocutionary act is how the public commitment changes the positions of the participants.
Speech acts may be labelled by their communicative status (was it understood), type of semantic content (eg: question, answer, statement-non-opinion). It’s important to include relations in the representation of speech acts, because to interpret an answer, you have to identify which question it answers.
Linguistic meaning includes things other than truth-conditional semantics. Evaluative adverbs express the speaker’s attitude, like “hopefully”, and they can’t appear in questions or commands. They are words in English but may be morphemes in other languages. In Japanese, there are plain vs polite morphology to express social distance.
Ambiguity exists on many levels, eg: word sense ambiguity, structural attachment (“… with a telescope”), semantic scope (“he must sell a car”), pragmatic (“have you emptied the dishwasher” — can be question or indirect request). A strictly pipelined approach is poor at resolving ambiguity, we often use information from later in the pipeline to resolve earlier ambiguities.
We use language in combination with non-verbal actions. Simplest example is pointing to something while saying “what is this?”. Or when an utterance references an action that somebody just did, which is not a linguistic entity.
Formal lexical semantics often treats lexical items as atoms, and there’s a lot of semantic relationships that this fails to capture. The semantics of a word includes its word sense, semantic roles, and connotation. Sometimes lexical semantics is attached to groups of words like “pick out” rather than a single word.
Ch4: Lexical Semantics: Senses
A word can have many senses, and WordNet keeps track of them. Similar words like English “ride” and Japanese “noru” can have similar meanings in most of their senses, but the set of sense distinctions isn’t exactly the same.
Regular polysemy is polysemy that’s predictable, like between an animal and meat of that animal. If the senses are related but not in a regular way, then it’s polysemy (but not regular polysemy). Constructional polysemy is regular polysemy + sense extension. Sense extension is when two senses are related, but can’t cohabit the same lexical entry. An example violation is: “the professor took his hat and his leave” — this violation is called zeugma. In the generative lexicon theory, each lexical item has qualia structure that determines its meaning when used in combination with other words.
Etymological origin is not enough to distinguish homonymy from polysemy, as some words are derived from the same source but speakers are unaware of this. Word senses change over time, through laws that can be quantified with NLP. Blocking is when a productive process is preempted by a sufficiently high-frequency word that means the same thing, like “pork” blocks “pig” being used to mean the meat of a pig.
Distributional semantics views the meaning of a word by the company it keeps. If the context window is large, it captures topic relatedness, while if it’s small, it captures more like synonymy. Vector semantics are better than WordNet for quantifying differences between “cat” and “dog”, and capturing how words are used by actual speakers rather than the idealized meaning in the formal hierarchy. However, it’s challenging to distinguish synonyms from antonyms in vector space.
Sense extensions often start out as metaphors, and are conventionalized through use, so it’s hard to draw the line between them. Metaphorical extensions have a lot of variation, but to some extent, unacceptable extensions can be predicted from lexical properties. Transitive words often have lexical-specific “default meanings” when used intransitively, like “drink” by default means to drink alcohol (but this can be defeasible depending on context).
Ch5: Semantic Roles
Relational nouns are syntactically nouns express relations between arguments, not entities themselves. Semantic roles don’t always consistently map to syntactic roles. There’s no agreed upon list of semantic roles, as they can vary in granularity. The most granular corpus is FrameNet, the least granular is PropBank, and VerbNet is somewhere in between.
Selectional restrictions limit the arguments for a verb (eg: animacy), but they can be violated in special situations or for metaphors. So this is a soft restriction, but in some languages, animacy is a hard constraint.
Arguments can be dropped in some situations in English, like imperative and passive constructions, but in some languages, dropping is a lot more common. Whether dropping is allowed is also a property of lexical items. Coreference resolution for dropped arguments is a hard NLP problem.
Ch6: Collocations and Multiword Expressions
Multiword Expressions (MWEs) are groups of words which form a special meaning beyond those of individual words. Mutual information is better than ngram frequency at identifying them, but still might pick up some false positives. MWEs, like normal words, can have polysemy (eg: “pull up”, “make out”). MWEs are variable in how much syntactic variation they allow, and how predictable their meanings are. It’s difficult to predict which variations are allowed.
Ch7: Compositional Semantics
Semantics can be written formally in the predicate-argument structure: this allows you to use classical logic operations like wedge elimination to conclude . Sometimes you need quantifiers to express the meaning of words like “something”. There are lots of different formal systems to represent semantics.
Coordinated constituents have a subtle ambiguity: in “Alice and Bob smiled”, the reading is distributive, and can be broken down into “Alice smiled” and “Bob smiled”, but “Alice and Bob met” has a collective reading and can’t be broken into “Alice met” and “Bob met”. Ambiguity results in cases like “Kim often sang and danced”.
Quantifier scope ambiguities like “every student saw a teacher” usually don’t present as syntactic ambiguities (one exception is CCG). Often the ambiguity is resolved by context, eg: “her name was Mrs. Smith”. It’s useful to resolve this ambiguity, eg: if you want to translate to a different language and must choose one interpretation. In other cases, humans don’t always resolve scope ambiguities unless there’s a need to.
Word vector models can be averaged, but that throws away word order information. Some people have tried combining word vectors with linguistic theories to achieve compositional semantics better than averaging.
Ch8: Compositional Semantics beyond Predicate-Argument Structure
How to represent tense? Simple way is to define past tense as “X happened some time before now”. But this isn’t how it’s used in natural language, eg: “Oh no, I left the oven on!”. Better to have a speech time (S), reference time (R), and event time (E). Reference time is determined by context. Then past tense is “E, R < S” and present perfect is “E < R, S”. Situation aspect (or Aktionsart) describes the temporal properties of an event, and can be stative, activity, achievement, and accomplishment. Viewpoint aspect views the event as perfective (completed) or imperfective (ongoing).
Evidentials encode the source of knowledge of a statement, in English it’s not a grammatical feature. Often there’s a distinction between firsthand and reported information, a Papuan language has a six-way evidentiality distinction. Politeness is also often encoded grammatically, eg: T-V distinction in French, and verbal inflections in Japanese and Korean. Politeness is sometimes helpful for resolving dropped arguments, since honorific speech is never used to refer to yourself.
Ch9: Beyond Sentences
A lot of approaches to annotating discourse, but typically there’s some notion of coherence, and a set of a discourse containing the set of public commitments made by the speakers. Rhetorical Structure Theory (RST) has over 200 coherence relations. Coherence requirements constrain how discourse can proceed and what anaphors can resolve to.
Lexical processing and discourse can interact. Some words like “annoy” lexically expects an explanation to follow, so that’s how it’s interpreted, even if there’s no commonsense prior, eg: “Mary is annoyed. Sandy ate soup last night”. Conversely, discourse structure can give information to resolve lexical ambiguity.
Compositional semantics usually deals with one sentence at a time, but anaphors can reach across sentences, so it’s not correct to leave them as free variables. Sentences can also be logically equivalent, but differ in how they control anaphors. Some people have applied game theory to model discourse, but this is also limited because during a discourse, one speaker can introduce an option that was previously unknown to the other participant, and this isn’t allowed in game theory.
Ch10: Reference Resolution
Reference resolution is different from coreference resolution because the referent doesn’t have to be equal to something else in the text. It can be a complex event that’s not any syntactic constituent, eg: “John kicked Bob. It hurt”. Chomsky’s Binding Theory determines which syntactic scenarios are permissible for referencing.
Parallelism and semantic roles also influence preference for coreferences. Referencing something in a modal is inaccessible, unless the anaphor is in a modal too. Discourse structure also affects accessibility of referents: only the topic (or right frontier) of the discourse is available.
Ch11: Presupposition
Presupposition is a statement that must be true for the current one to be meaningful, eg: “Kim’s cousin is a student” presupposes “Kim has a cousin” and “There is someone called Kim”. In a discourse, the presupposed statement doesn’t have to be already part of the shared knowledge, presupposition accommodation allows the required knowledge to be added on the fly.
Words can be holes if they let through all presuppositions in the embedded clause, like “I know that you stopped smoking” entails the same presuppositions that you once smoked. On the other hand, plugs block presuppositions, eg: “I told you to stop smoking”. Filters sometimes block and sometimes not, depending on whether the condition is met.
Ch12: Information Status and Information Structure
Information status of a referent is a continuum of how much it’s already known in the discourse, vs something new. For example, “I saw a dog” vs “I saw this dog” vs “I saw the dog” vs “I saw it”. The implicational hierarchy range from in-focus <= activated <= familiar <= uniquely identifiable <= referential <= type identifiable. English distinguishes 6 levels in the hierarchy, other languages have fewer levels.
Information structure is whether a proposition is common ground or something new (as opposed to information status, which is for referents). In English, this information is usually encoded in stress and intonation. There are different ways of analyzing given vs new content. Topic is what the sentence is about, focus is the piece of new information, background is the rest. Others define comment as thing that’s said about a topic, ground as the part of the sentence that’s not the focus, etc.
Different languages mark topic / focus differently, like with particles in Cantonese; preverbal position is focused in Basque. Information structure is not just emphasizing different parts, but can affect truth conditions: in “dogs must be carried on this escalator”, it could mean 3 different things depending on which word is stressed.
Ch13: Implicature and Dialogue
Implicature is implied meaning beyond what’s explicitly said. Conversational implicatures follow from Grice’s maxims and can be canceled by further context. These are derived by noticing that one of the maxims seems to be violated, unless some presupposition p is true, then assume that p must be true. Conventional implicatures are embedded in the meaning of the word, like “he is rich but honest” implies that rich people are not honest.
Some people have modelled implicature as agents participating in signaling games, perhaps with POMDPs. Others argue that coherence is sufficient, and reasoning about cognitive states isn’t necessary. Determining what was said is easier than determining whether the statement is true.
In courtroom dialogue, you can be tricky by implying something that’s false, but later, you can’t be convicted of perjury because you can plausibly claim that you implied something else. Thus, it’s not always “safe” to treat implicatures as part of the public record.
Last chapter briefly surveys a lot of datasets and tools related to semantics and pragmatics processing, in multiple languages.