Rating: 8.4/10.
Ch1: Generative Grammar
Generative syntax was first developed by Noam Chomsky, to try to capture what we know intuitively about syntax. Use scientific method to gather data, form hypotheses of rules, and check if they agree with native speaker judgements. Source of data can’t be solely from corpora, since these only have correct sentences, you also need sentences that are incorrect.
There are different ways sentences can be ungrammatical. Syntactic ill-formed vs semantic ill-formed. Some sentences (eg: garden path, center embedding) are judged as ungrammatical (eg: “cheese mice cats catch love stinks”) but after some thought are actually grammatical: this is the difference between competence and performance.
Language learning is different from most types of learning: part of it is innate and given by the Universal Grammar (UG). We can argue it’s at least partially innate because we don’t have data to determine all the rules, yet we all end up with the same rules.
Ch2: Parts of Speech
Traditionally nouns are objects / place / things, verbs are actions / states, but this is often violated (eg: “destruction”). Better to define them by morphological and syntactic distribution criteria. The open (lexical) classes in English are nouns, verbs, adjectives, and adverbs; the closed (functional) classes include determiners, prepositions, conjunctions, etc. Can subcategorize each word class, and give features to them. Verbs can be categorized by their argument structure.
Ch3: Constituents, Trees, Rules
A constituent is a group of words that function together as a single unit. The principle of modification says that if XP is some phrase that modifies Y, then XP must be a sister of Y (and daughter of YP). Book gives a sketch of phrase structure rules (PSRs) in English, like NP = (D) (AdjP+) N (PP+) (CP). You can draw trees bottom-up (easier for beginners) or top-down (faster for experts). Constituents can be found in other languages, although polysynthetic languages and free word order provide challenges.
Ch4: Structural Relations
A dominates B if there is a path going down from A to B. A constituent is a set of terminal nodes exhaustively dominated by some node. Formal definition of precedence, but basically means “to the left” — note that branches can never cross in a constituency tree.
A node c-commands all of its sisters, and children of sisters. C-command can be symmetric (if B also c-commands A) or asymmetric. Government is a local version of c-command, A governs B if A c-commands B and there’s nothing further down that also c-commands B. Grammatical relations can be defined in terms of trees, for example, a subject is (roughly speaking) the NP or CP daughter of a TP.
Ch5: Binding Theory
An R-expression refers to something in the real world. An anaphor must refer to some other NP (eg: “herself”). A pronoun may refer to something in the sentence, or something from outside context (eg: “she”). Binding theory describes what kinds of coreferences are possible, in terms of syntactic structure. In “Heidi bopped herself”, “Heidi” is the antecedent and “herself” is the anaphor. We say that “Heidi” and “herself” are coindexed.
Define A binds B as: A c-commands B, and A and B are coindexed. This is an asymmetric relationship. Three binding principles:
- Principle A: an anaphor must be bound in its binding domain. Roughly, binding domain is the clause but excluding any embedded clauses, so not “Heidi_i said that a dog bit herself_i“.
- Principle B: a pronoun must be free in its binding domain.
- Principle C: an R-expression must be free everywhere.
Ch6: X-bar Theory
X-bar theory was first developed by Chomsky in 1970, and instead of flat trees like NP -> Det N (PP+), you have deeper trees with lots of intermediate N’ nodes.
This is so that you can handle one-replacement: coordinating the word “one” for part of the NP, for example, “the big book of poems with the blue cover and the one with the red cover”. There has to be a constituent for “big book of poems” to be substitutable with “one”.
It turns out that VPs can be replaced similarly with “did so”, eg: “Mary ate beans with a fork but John did so with a spoon”. AdjP and PP have less obvious patterns but can be analyzed the same way. At this point, we’re tempted to generalize all of them (NP, VP, AdjP, PP) into a single XP, with the following patterns:
- Only obligatory element of phrase is a head (endocentricity).
- Anything in rule that’s not a head must be an optional phrase. Eg: X’ -> (YP) X’ — adjunct rule.
- Rules that introduce the head: X’ -> X (YP) — complement rule.
- Rules that introduce the topmost layer: XP -> X’ (YP) — specifier rule.
(Note that for NP, determiner is a sort of exception because you have NP -> (Det) N’ and the Det is not a phrase.)
We can structurally define the difference between complement and adjunct. A complement is an XP that’s sister to a head and daughter to a bar; an adjunct is sister to a bar and daughter to a bar. Thus the complement rule is X’ -> X (YP) and adjunct rule is X’ -> X’ (YP). You can permute the order of adjuncts but not complements. Complement must be next to the head.
The one-replacement rule can be phrased structurally as: replace a N’ with “one”. Can’t replace head, so “the book of poems” can’t become “the one of poems”. Similarly, the did-so-replacement targets V’. All of the N’ and NPs above N are called projections of the N head. The NP is called the maximal projection and the N’s are intermediate projections.
To explain how different languages have different word orders, each language has parameters for whether the adjunct rule is X’ -> X’ (ZP) or X’ -> (ZP) X’. The values of these parameters are acquired by children.
Ch7: Extending X-bar Theory
Determiner phrase is the idea that the head of an NP is not the N itself, but the determiner. Then the rule is DP -> D’ and D’ -> D NP, and NP cannot contain any more determiners. Evidence comes from the fact that the genitive ‘s and determiner are in complementary distribution. So the genitive ‘s is treated as a determiner.
A clause (TP) is either a root (also matrix or main) clause or an embedded (or subordinate) clause. Note the main clause contains the embedded clause, not just the part outside. Embedded clauses can be categorized as specifier, complement, or adjunct clauses.
Another way of dividing up clause types are by whether they are finite or non-finite / infinitival. Finite clauses have tense and subject agreement and may contain auxiliaries and modals, whereas non-finite clauses must be in infinitive form like “to eat”.
Complementizer phrases have the problem that the complementizer is optional. To fix this, assume null complementizer, and to explain subject-aux inversion, the T is moved to the position of the null C. Also, useful to assume that every clause has a CP on top of it.
The final rule that doesn’t fit is the TP rule, since the T is optional when there is no auxiliary, and there are two ways to fix it. The older affix lowering analysis has the tense ending like “-ed” as T, which is lowered to the end of the verb when realized. The newer analysis doesn’t have the affix generated in T, but contains a tense marker that controls the form of the V.
Ch8: Constraining X-bar: Theta Theory
Problem with X-bar theory so far is that it generates sentences that are not acceptable, since different verbs allow different arguments (selectional restrictions). Review of thematic roles: agent, experiencer, theme, goal, recipient, location, instrument, beneficiary.
A theta role is a mapping from syntactic arguments to thematic roles, and you can draw it in a theta grid. There must be a one-to-one mapping between arguments and theta roles. The subject is underlined; don’t include adjuncts in the theta grid. Chomsky claims the theta grid for each word is stored in the lexicon.
Expletive pronouns is when “it” doesn’t get a theta role, eg: “it rained” or “it is likely that she is here”. The word “likely” has theta role proposition, but no subject, so an expletive is inserted, because the Extended Projection Principle (EPP) says that all TP must have a subject.
Ch9: Auxiliaries and Functional Categories
Complementizers have a lot of diversity in selectional restrictions, represent this by subcategories of embedded clauses. Eg: “I think that” is valid but not “I think if” or “I think for”. Have features based on whether it is a question [+Q], whether it’s finite [+FINITE] / [-FINITE].
Determiners select for different NPs. ‘A’ has restriction [-PLURAL, -PROPER, -PRONOUN], ‘The’ has restriction [-PROPER, -PRONOUN]. Assume pronouns have a null determiner, but is still DP and not NP. ‘Much’ and ‘many’ select for [+COUNT] and [-COUNT]. ‘All’ takes a DP rather than NP as complement, so you can have “all the boys”.
Tense is the time of an event relative to when the sentence is spoken; English has past, present, and future tense. Aspect is when it happens relative to a reference point. Event that happens before the reference time is the perfect aspect; event that’s ongoing at the reference time is progressive. Tense can be combined with aspect, so you can have past perfect / present perfect / etc.
Voice is changing the number of arguments of a verb, can be passive or active. Mood is the speaker’s perspective of the event, whether it’s possibility, necessity, obligation, etc; it must precede all other auxiliaries. It’s a bit challenging to decide which modals and auxiliaries should have category T.
Modals and “will” take category T, and have a VP complement that with [bare] restriction. Past and present tense have the same structure, but with a null T that doesn’t have any phonological form.
Perfects “have” are analyzed as category V, but with a VP complement. This is a “stacked VP” form which hasn’t been seen before so far, but is allowed in X-bar theory. Progressive and passive auxiliaries “be” have similar form, but with more selectional restrictions on the VP complement.
Do-support is when negating a verb that has no auxiliary, or emphatic positive. For the negative case, “did” takes category T, and has a NegP complement. For the emphatic case, the category is still T, but has a VP complement. Note that this doesn’t handle subject-aux inversion for questions.
Ch10: Head-to-Head Movement
Movement is useful for explaining sentences like “Je mange souvent des pommes” where “souvent” is an adjunct, but can’t be generated by X-bar theory without the tree crossing. Chomsky proposed a flow chart where X-bar rules generate an underlying D-structure, which is modified by transformational rules to produce the surface S-structure.
The French example can be explained by verb movement / verb raising, where the head V is moved to the head T in the S-structure (V->T movement). Why French has verb raising and English doesn’t can be explained as a parameter.
VSO languages like Irish are a problem for X-bar theory because VP (consisting of V and O) are supposed to be together. Flat structure explains the data, but is inelegant, and also there’s evidence of a VP constituent. The data can’t be explained by word order parameter or V->T movement. Instead, the VP-internal subject hypothesis proposes that the subject DP is actually under the VP, then V->T movement generates the correct VSO form. To explain non-VSO languages like English, DP-movement has the DP move from specifier of VP to specifier of TP.
Subject-aux inversion is explained using T->C movement. In question sentences, the C has a null feature [+Q], although it’s phonologically null, it triggers movement of the auxiliary to C position. In French, main verbs can also undergo T->C movement, but not in English. Instead, we have do-support, which says if there is no other option to fill the T position, insert a dummy “do”.
Ch11: DP Movement
Theory of DP movement is motivated by DPs appearing in syntactic positions that are unexpected given their theta roles. The Locality Constraint says that theta roles can only be assigned in the same clause, so you can’t have “John thinks that left”. However, “John is likely to leave” is acceptable, even though “John” is the agent of “leave” and not in the theta grid of “likely”.
In this case, the D-structure has “John leave” as the innermost VP, and DP movement moves the DP to a specifier position. This type of movement is called raising.
English doesn’t really have morphological case, but it has abstract Case (Case must be capitalized), a feature of nouns that affect syntax. All nouns must undergo a Case filter via feature checking: every noun must be adjacent to a Case assigner: nominative case is specifier to a finite T, accusative case is sister to a transitive V, etc. This explains why we can’t raise once and insert an expletive “it” producing “It is likely Patrick to leave”, since “Patrick” doesn’t get a Case. You must raise twice to produce “Patrick is likely to leave”.
Passives undergo a similar DP movement. For passives, Burzio’s Generalization says that a predicate that has no external theta role cannot assign accusative case.
Ch12: Wh-movement and Locality Constraints
Another type of movement, called wh-movement, involves yes/no questions and questions involving “wh” like “who / why / what / when / how / etc”. The “wh” word is often at the beginning of the sentence, far from what the theta role would predict. Propose a [+WH] feature in the C, which causes the DP “wh” to move to specifier position of CP to check the [+WH] feature.
For example, “Whom did Matt kiss”, the D-structure has “whom” as the complement of “kiss”, but is moved to the beginning. Wh-movement is usually accompanied by T->C movement to explain the subj-aux inversion. It doesn’t always move to the beginning, eg: “I wonder who Jim kissed”, it moves to the complement of “wonder” which is a CP that must have the [+WH] feature.
Relative clauses also have movement, in this case to link the head noun to the gap. There’s usually a free alternation whether “that” is used (but not always, that-trace effect explains when it’s restricted). In sentences like “I bought the book you recommended”, “book” is the theme of “bought” and “recommended”, which violates the theta criterion, so we explain it by having movement of a silent operator DP.
Islands are things that block movement out of, and there are several different types. Complex DP island says you can’t move a wh-phrase outside of a DP. Wh-island says you can’t skip over a second wh-phrase during movement. You also can’t move within a subject CP, or out of coordinate clauses.
Why do these island constraints exist? One explanation is the Minimal Link Condition (MLC), which says movement must be to the closest potential landing site. Certain movements actually involve two sub-movements. This explains the wh-island constraint: the second wh-movement is blocked because the first one left a trace during its movement. Movement doesn’t apply in wh-in-situ sentences like “Shelly loves who?”, but this is only possible for echo questions that seek confirmation for what was heard.
Ch13: A Unified Theory of Movement
We’ve seen several different types of movement, but they all involve moving something to appear next to something else. Generalize this to a Move rule, and the Principle of Full Interpretation says that features must be checked in a Local Configuration (can be specifier-head, or head-head).
How to explain languages that seem to not have wh-movement, like Chinese? Chomsky proposes that there is movement, but you don’t hear it. So instead of D-structure going to S-structure, we have D-structure undergoes overt movement to SPELLOUT, which is used to generate the phonetic form. Then, covert movement is applied to SPELLOUT to produce the logical form.
Another evidence of covert movement in English is quantifiers, usually the quantifier phrase must c-command the constituent to have quantifier scope over it. But in the sentence “Everyone loves someone”, it can be interpreted as “someone” having wide scope, so explain this by covert movement of “someone” to the beginning to c-command the rest.
Ch14: Extended VPs
A problem is how to deal with ditransitive verbs, with two DP arguments, which doesn’t seem to fit into X-bar theory. Solution is introduce a little-v element which means “CAUSE”, with movement rules to get everything in the right order. Also introduce a silent AgrO category with movement for the purposes of assigning case.
Ch15: Raising, Control, and Empty Categories
Consider sentences “John is likely to leave” and “John is reluctant to leave”. The first is an example of raising and the second is control. In raising sentences, the complement of “likely” is the CP “that John leaves”, and there are multiple ways to get the S-structure from the D-structure. For control, “John” is the experiencer of both “reluctant” and “leave”. Since this violates the theta grid criterion, introduce an empty PRO category to be co-indexed with “John” but takes on the experiencer of “leave”.
Another way to distinguish control from raising is with an idiom like “the cat is out of the bag“. With a control sentence “the cat is likely to be out of the bag“, the idiomatic meaning is retained, but not with a raising sentence “the cat is reluctant to be out of the bag“. This happens because idiomatic meaning is constructed in the D-structure, as long as all the parts are together at that stage.
Several kinds of PRO: if it is coindexed with another DP then it’s controlled and the other DP is the controller. Arbitrary PRO is one that’s not controlled by anything. Obligatory control is when it must be controlled by something; optional control means it’s possible to be controlled or not. What kind of entity is the PRO? Its distribution is different from R-expressions, pronouns, and anaphors from binding theory, so instead, control theory tries to explain how PRO works.
It is tricky to describe how PRO gets controlled. One theory is that it’s in the theta grid of the main verb, but this isn’t always the case, eg, for the word “beg”, and may depend on pragmatic and non-syntactic factors. PRO must not be confused with null subjects in pro-drop languages, which is also called little-pro.
Ch16: Ellipsis
Ellipsis is when a string that has already been uttered doesn’t need to be repeated a second time. Most common type is VP ellipsis, like “Mary will eat a sandwich but Raymond won’t“. Pseudogapping is when everything in the VP is removed except the object, eg: “Robin will eat sandwiches but he won’t ice-cream“. There are many other types of ellipsis.
Linguists disagree about whether ellipsis should be analyzed as a deletion or copying process. The LF-copying hypothesis says the elided part is copied between SPELLOUT and the logical form; the PF-deletion hypothesis says it is deleted between SPELLOUT and the phonological form. One evidence for LF-copying is that in sentences like “Josh will hit himself, and Otto will too“, it is ambiguous who Otto will hit, but in PF-deletion it can only be Otto himself. There are other arguments supporting PF-deletion though.
Ch17: Advanced Topics in Binding Theory
Raising and wh-movement seem to interact differently with binding theory: raising seems to happen before binding principles are applied, but wh-movement after. Chomsky fixes this by defining movement as a type of copying, but the original copy is silent. This set of two DPs is called a chain.
Previously, we defined binding domain as within the current CP, but there are counterexamples. For example, can’t have “Heidi_i believes Mary’s description of herself_i“. Chomsky proposed to explain this by excluding from the binding domain anything that could be affected by a potential antecedent. The binding domain is defined differently for anaphors and pronouns.
Ch18: Polysynthesis, Incorporation, Non-configurationality
Polysynthetic languages have morphology do most of the work of syntax — how can the UG theory handle this? The syntax-free hypothesis says that these languages lack a syntax component, which would seriously challenge the hypothesis so far. The radical pro-drop hypothesis says that the same structures exist in these languages, but almost all of the arguments are allowed to be dropped. Incorporation is when a fully lexical noun argument is inside the verb, and Baker (1988) proposes to analyze this as a special kind of head-to-head movement from N->V. This generally obeys the movement rule that you can only move “upwards” in a tree (to a position that c-commands you).
Some languages have free word order: scrambling refers to languages like Japanese where word order is partly determined by information focus; non-configurational languages like Walpiri have even freer word order. For scrambling, Rizzi (1997) proposed an additional TopicP structure on top of TP to which DPs can move to. For non-configurational languages, Hale (1983) proposed the dual-structure hypothesis, where the phonological form (but not the logical form) is generated by very different phrase structure rules that allow free word order. Others propose it has similar (but more extreme) move operations as scrambling, since word order still tends to be affected by discourse functions.