Rating: 8.3/10.
Ch1: Beginning Concepts
Language has finite rules and symbols, but has infinite generation. Can think of language as a system to connect signals (acoustic, or words on a page) to meaning. This is done through phonology, morphology, syntax, etc. Linguistic competence is the knowledge of a language’s lexicon and grammar; linguistic performance is the ability to understand it and use it to communicate.
Ch2: The Nature of Linguistic Competence
There are many language but they share a lot in common, this is called the Universal Grammar; differences (eg: in word order) are parametric variations. All grammars have phonological, morphological, and syntax components. At the speech signal level are: articulatory / acoustic phonetics, with place and manner of articulation, vowel height, etc.
Phonological component: phonemic inventory, minimal pairs, allophones, complementary distribution. Phonotactic constraints differentiate possible vs impossible non-words. Prosody includes timing (stress vs mora), tones, intonation, and can affect how a sentence is interpreted syntactically. Morphological component: can be bound or free; can be inflected (for tense, gender, number) or derivational. [Review of syntax skipped].
Metalinguistic awareness is ability to reason about linguistic objects, eg, when you realize that a sentence is ambiguous. Different types of ambiguity are perceived differently as revealed by experiments.
Ch3: The Biological Basis of Language
There is evidence that language is a species-specific, biological system. Animals have been taught symbols in sign language, but none have learned syntax. All languages have recursion (even Piraha) and a language constructed to violate UG was more difficult to learn. Children don’t need specific instruction to learn language, and even impoverished children learn their language fully. Children of different cultures learn their languages at similar rates, and make similar mistakes. There is a critical period, after which you can’t learn syntax, as evidenced by Genie.
Two kinds of aphasia related to brain lesions. In Broca’s aphasia, they can speak individual words but with no syntax. In Wernicke’s aphasia, their speech is syntactic but meaningless. Prior to brain surgeries, doctors use electrical currents to temporarily disable small parts of the brain to localize the language areas, so they don’t mess it up.
Language is lateralized in the left hemisphere for most people, Wada test paralyzes half of the brain to determine which half language is lateralized. Patients with split brains lose the ability to name words if the object is seen in the opposite visual hemisphere as the language processing area. In dichotic listening, different sound is played in each ear, and for language there is a right-ear advantage.
EEGs measure brain activity via electrodes on the scalp. Event-related potentials (ERPs) are EEG signals seen after an event. N400 is a type of ERP signal that appears 400ms after a stimulus. Sentences with syntactic errors trigger a difference in N400 after about 300ms; sentences with semantic errors do so later, after about 600ms. At least some aspects of language are genetic: a family with a mutation in the FOXP2 gene develop language disorders.
Ch4: The Acquisition of Language
The nativism view of language acquisition says that children have a biological apparatus for learning language (Chomsky calls it the language acquisition device, LAD). This is a set of innate priors that allow children to learn the right syntax with limited input. Caregivers don’t need to do anything special, like encourage imitation or correcting mistakes. Mothers do use a higher-pitched and slower prosody, and speak semantically simpler sentences, which probably helps.
Infants begin to learn their language from in the womb, the first thing they pick up is prosody, so they can differentiate their mother’s language from other languages. To measure what infants understand, you measure their rate of sucking, they suck more when they’re paying attention. Initially they’re attuned to all phonemes, but by 12 months, they acquire the language’s phonemic inventory. They babble as a way of practicing phonemes.
At 12-18 months, they enter the holophrastic stage (one-word stage). They often under or overgeneralize, and during fast-mapping they get an approximate meaning of a word from one example which may be slightly incorrect. Their vocabulary grows dramatically from 12 months to 6 years of age. They use a number of heuristics to help: the whole object assumption is that a word probably refers to a whole object (not a part of object or a property of the object). They mutual exclusivity assumption is that each object only has one word. Of course, these are not true for bilingual children.
During the preschool years, they gradually acquire morphology and syntax, and mean length of utterance (MLU) increases. Order of acquisition of different morphemes relatively constant across children, and they tend to overgeneralize irregular patterns (“goed” instead of “went”). Complex sentences with relative clauses produced around age 3.
Language ability continues to improve into school years. Younger children fail to backtrack a greedy parse like “put the frog on the napkin into the box”. Children are also bad at taking into account the informational needs of the listener, so they might produce sentences with ambiguous “he” and not realize the ambiguity. Metalinguistic awareness (eg: counting the number of phonemes in a word) is a good predictor of reading ability.
Ch5: The Speaker: Producing Speech
Production of a sentence begins with the speaker’s preverbal message, which he then retrieves the lexical items, puts in the syntax, and outputs the phonological string. In bilinguals, both languages can be accessed at the same time, with code-switching between sentences or within a sentence. Code-switching is different from borrowing, which typically undergoes phonological adaptation to fit into the target language.
Lexical retrieval is the process of retrieving the form of a word. It happens at the same time as building the syntax tree, and happens very fast (adult knows 40k words and can speak at 1-5 words/second). When you slip up, you replace a word with a semantically or phonetically similar one, suggesting they’re organized in “neighborhoods” in your brain. The tip-of-the-tongue phenomenon is when you can’t recall the word for something, but typically you can remember some aspects of the word (stress or starting phoneme).
Word exchange errors provide a window into our linguistic processing. The swap is always between two words of the same word class, so the result is semantically odd but syntactically acceptable. Bound morphemes are not swapped (eg: “you ended up ordering” -> “you ordered up ending”), indicating there’s a level when the syntax is constructed but before the morphemes are applied. Plural attraction is when plural agreement gets confused when a mismatch is inserted between, eg: “the bridge to the islands [close / closes] at seven”. Errors are more common with plural than with singular.
Planning a complex sentence takes time, as a more complex sentence or uncommon syntax structure have longer initiation time. Syntactic priming is when after hearing one particular syntax structure, you’re more likely to produce the same syntax structure. This has been observed across languages, meaning the syntax representations are shared for bilinguals.
The next step is emitting the phonemes. Phonological speech errors only swap similar phonemes, and the output never violates phonotactics. Prosody doesn’t get swapped when phonemes are swapped, indicating that the prosody is attached to the structure level, not the word level.
The source-filter model of vowel production has the vocal cords that are the source, which are modified by the mouth cavity (filter). The F0 varies between speakers, but F1 and F2 are fairly consistent. Coarticulation is the fact that articulatory properties don’t apply to individual phonemes, but to several consecutive phonemes at a time. Eg: the F1 shape for the vowel is different for “ba” and “ga”.
Ch6: The Hearer: Speech Perception and Lexical Access
Several things make speech perception nontrivial. Phonemes don’t appear in the signal sequentially, but in an overlapping way (parallel transmission). There is variability between speakers and within speakers depending on context, and there is environmental noise.
Phonetic properties like voice onset time (VOT) are perceived as categories, rather than continuous variables. When you vary the VOT, there’s a discrete cross-over point when it sounds like a ‘p’ instead of a ‘b’. Some languages like Thai have a three-way distinction instead of two-way, and it is difficult for English speakers to distinguish them.
Speech perception is constructive: perceived categories are a combination of many physical phenomena. The McGurk effect is when a video shows the mouth shape “ga” but the audio is “ba”, the sound is perceived as “da”. Another illusion is phoneme restoration, where an “s” is deleted and replaced with a cough, and listeners still perceive “s” as being there (although this doesn’t work when it’s replaced with a silence). Slips of the ear are when message is misheard due to distraction or noise, like “laid him on the green” -> “Lady Mondegreen”.
Bottom-up information is a clear phonetic signal, top-down information is inferring from context. When bottom-up is unclear, we use top-down signal to narrow down the word. In an experiment with {date, bait, gate}, when the initial consonant is clear, it’s always heard correctly (even if the sentence is semantically odd), but when the consonant is ambiguous, listener fills in the word that makes more semantic sense. Orthography plays a role in phonological awareness: when Hebrew speakers are asked to delete the first phoneme in a word like “gut”, they often delete the vowel as well.
A lexical decision task is a task to decide if a string is a word or non-word, as quickly as possible. Impossible non-words (eg: “tlat”) are faster than possible non-words (eg: “floop”). Higher lexical frequency words also have faster lexical decision times.
Priming experiments show you very briefly a prime word (50ms) is shown on screen, before showing the target word, to see if it influences lexical decision time. Can do semantic priming (nurse – doctor) or form priming (table – fable), both decrease the decision time. Priming effects also observable across languages for bilinguals.
In the cohort model of lexical access, words are stored by phoneme prefix. Evidence for this is that the recognition point (measured by N400 signals) happens before the end of the word. Neighborhood density is the number of phonetically similar words close to a word, and affects lexical retrieval time.
Resource sharing is when you’re doing multiple cognitive tasks at the same time, your performance degrades. This lets you measure the cognitive load required to retrieve a word. For example, an experiment found that a lexical ambiguous word like “pipe” requires more cognitive resources than an unambiguous word like “cigar”.
Ch7: The Hearer: Structural Processing
There is evidence that syntactic structures actually exist in our mental representations (and not just a tool for analysis). In fMRI scans, a region in the brain has activity when listening to sentences, that’s not activated when listening to a list of unstructured words. Word category errors (syntax structure is impossible) trigger different ERP responses from morphosyntactic violations (agreement errors).
In click displacement experiments, researchers place a click in the middle of a word, and listeners are asked whether the click happened before or after a word. They perceive the click so that it separates clause boundaries. Sentences where the clause boundaries are difficult to locate incur higher processing costs (“he knows the boys are rowdy” vs “he knows that the boys are rowdy”).
Global ambiguity is when the sentence is structurally ambiguous, and listeners often don’t notice it, and just pick one interpretation. Local ambiguity is when the structure is ambiguous given current information, but is resolved later, like “the student told the professor that…” (“that” can be a complementizer or relativizer). If the ambiguity gets resolved incorrectly and must be corrected much later, then it’s a garden path sentence.
The garden path model tries to explain how we build syntactic structure, and consists of three rules. Minimal attachment tries to incorporate each word in the existing structure in the simplest way; if it doesn’t fit, then we need to back and reanalyze it. You can use eye trackers to detect garden path sentences: when the reader hits a parsing error, his eyes will go back to earlier parts of the sentence.
The late closure rule tries to incorporate new words into the current constituent as much as possible. However, this rule doesn’t seem to be universal: “someone shot the maid of the actress who was on the balcony”, English speakers prefer “actress” being on the balcony whereas Spanish speakers prefer “maid”. Sometimes people will build syntactic structures that are “good enough” to save time, interpreting it in a technically incorrect way.
The filled gap effect is when you encounter a gap phrase like “which car”, you begin searching for something to fill the gap. When the gap cannot be filled until much later, the sentence has a high cognitive cost. The first syntactically matching solution is accepted, even if it’s semantically odd.
A lot of signals help us build linguistic structure without the need for reanalysis. A verb’s argument bias is a bias towards the argument being a direct object (“understands calculus”) or phrasal complement (“understands he knows the truth”). Thematic properties like animacy influence whether we interpret the first noun-verb as subject-verb or a subordinate clause. Prosody also contributes to disambiguating between parses. Pragmatic and common sense information is weaker than minimal attachment for avoiding garden paths.
Ch8: Discourse Processing
A discourse (also text, narrative) is a collection of sentences that are connected. The topic, participants, context, and function all influence the sentences in the discourse.
When sentences are being processed, they are held in working memory, which can hold at most 5-9 chunks, and the size of the working memory is called the working memory span. People with lower working memory spans have more trouble with long dependencies. When a sentence is stored in long term memory, only the meaning is preserved, and the exact structure is lost. We are more likely to remember the exact form when the sentence has special emotional significance.
We make many kinds of inferences when hearing sentences. For example, “he was pounding a nail”, you infer that he used a hammer. Inferences are stored in memory, so later, you will often remember hearing the word “hammer” appearing when it didn’t. Inferred information takes longer to access than surface information.
Anaphors are words that refer to something that was previously said, and must be matched to their referents. A referent is more available if it is focused. By default the entity mentioned last is focused; other ways are being the main character of a story, or having a name.
Bridging inferences (or backward inferences) are when a common-sense inference has to be made to connect two sentences that are otherwise incoherent, like “The tourist lit a match. The fire burned down a forest”. In contrast, elaborative inferences (or forward inferences) are made immediately, even though there is no need to do so for coherence. It’s hard to know how much forward inference we do. Scripts are common scenarios that everyone is assumed to know, like the sequence for eating at a restaurant. Communication between culturally similar people is easier because having more shared scripts allows you to make more inferences.
Examples of non-literal language use are indirect requests (listener must infer that a request is being made), and sarcasm. Sarcastic statements are processed quickly, but even easier when it is somehow alluded to in the previous context. Speakers must take turns, with end of clause and drop in pitch indicating end of turn, this takes about 200ms. They must detect communication breakdowns and repair the message, which requires identifying why it broke down.
Grice’s cooperative principles are four rules for discourse, they might not always be followed, but you make inferences assuming the other person is obeying them. The Maxim of Relevance assumes the sentence is related to the previous turn. The Maxim of Quantity assumes you will provide the appropriate amount of detail that the listener needs. The Maxim of Manner assumes you structure sentences in a coherent way (eg, chronologically). The Maxim of Quality assumes the other is telling the truth.