16 June 1995. NOAM CHOMSKY: ‘Minimalist Explorations’ at University College London’s Dept. of Linguistics
Chomskyan linguistics has had a great deal of influence - or so people say. I happened to tape one of his technical lectures on linguistics, delivered in University College, London, and herewith present a transcription of it for the edification of my numerous fans. (Most of the questions, which weren’t audible, are omitted, and many names and technical terms are queried). The reader must imagine quite a large lecture theatre with steeply-tiered seats, a green blackboard, and an air of excitement.
The chief impression I received was of Noam Chomsky as a sort of generator or powerhouse, setting the agenda in the way he describes the ‘New York Times’ and other rags as setting political frameworks. Three reserved rows, at the front, for department members, seated those who hung onto every word and who would presumably regurgitate the material and draw salaries for doing so. Of course, if Chomsky actually found a true theory of language, they’d be out of a job.
Another striking impression as far as I was concerned was that free communication within a subject, about which of course there’s much mythology, is important where it exists so the people know what to talk about, to avoid the sort of thing that happens when, in the everyday world, newspapers don’t get distributed and people don’t know what they’re supposed to think about anything.
I spoke to several people after this talk, for example an American who was supposedly studying the best ways to teach languages. (‘Applied linguistics’ sounds better than ‘language teaching’.) I asked whether e.g. with Latin they couldn’t try to transpose or convert Latin constructions into English, to get the feel of Latin. This seems to be more or less heretical.
The talk was about seventy minutes in total; there was a subsequent small time allotted for questions. The copyright in the talk is Noam Chomsky’s; the publication he talks of must have taken place, and in any case it seems he disagrees with much of the content, so I can’t imagine anyone would object to this Internet piece. I’m uncertain about the copyright status of this transcription, but if there are any rights I suppose I might as well claim them - Rae West
- NOAM CHOMSKY: “Well, Neil had suggested that I talk more or less about the errors that I found in the last couple of weeks the draft.. in the last chapter of an unpublished manuscript, and though I’m very impressed by the British educational system it’s hard to believe... [Laughter]
I’ll try to talk about some recent work.. I’ll try to give the flavour.. I’ll try to get through the errors in section ten. Let me begin with a couple of background assumptions and observations that help I hope put things in place. They will be familiar to those in the field.. by no means controversial.. and they become more and more controversial as they become more specific. In fact it wouldn’t be surprising if they turn out to be wrong.. as has happened plenty of times in the past.
The basic assumption is that there is a language faculty, some special aspect of the mind/ brain which is dedicated to the use of language. The language faculty consists of a cognitive system which stores information, and various performance systems which access the information. The - are we in the same ballpark here? I hear voices [laughter] - The cognitive system characterises an infinite class of expressions; expressions are sometimes called structural descriptions, each expression contains -
[Interruption by an official about people standing, fire regulations.. much shifting of people into seats...]
- each expression contains the information about a particular linguistic object, the relation between the language and the set of expressions is what is technically called strong generation. There’s a notion of weak generation.. it doesn’t seem to have anything to do with natural language, it’s caused an awful lot of confusion. The information in an expression has to be made available to the performance system somehow. And one standard assumption which I’ll keep to is that it’s made available in the form of what are technically called linguistic levels.. there are other ways in which it could happen; you can imagine dynamic systems of various kinds. But I’ll assume that information is available at linguistic levels, at interface levels relating, providing information from the cognitive system to one or other performance system.
I’ll further assume as a standard but may well turn out wrong that there are just two interface levels, one of them connected to the sensori-motor systems and one of them connected to all the other systems of language-use, the representations of the objects at these levels are called pf and lf, phonetic form and logical form, representations, so that means an expression is at least a pair of representations, one at the pf level, one at the lf level, maybe at most that, but it’s at least that. That’s a way of restating in complicated modern terminology that language is a sound with a meaning, in traditional terminology.
Now the language, the system that characterises the infinite set of expressions, I will assume that it’s a recursive procedure, not some other way of characterising infinite sets, and there are such infinite other ways, and I’ll call the generation, the system that forms, that generates the lf representation that system I’ll call syntax in one of the many terms of the word.
The er, well, skip some, for the early part of the subject, this part of the subject more or less revives or gets it present form about 45 years ago, and the early problem the early research for several decades was driven by the tensions, the still unresolved tensions, between two different goals, which lead you in different and opposite directions. One goal is to provide the information about languages with factual accuracy, that is to construct language recursive systems, grammars, in one of the senses of that much-abused term. So to provide grammars that give the facts accurately about Swahili, and about English, and Hungarian, and so on.
To the extent that a system does that accurately, it’s called descriptively adequate. The other problem is a more interesting one, and that’s the problem of finding out how anybody knows the descriptively adequate grammar, how anyone knows how you and I know or every child knows, and that’s the problem that’s called explanatory adequacy. A theory of language is said to meet the condition of explanatory adequacy to the extend that it is able to show how, from the data that might be available to a child, you can get a descriptively adequate grammar.
So that turns out, when you think, about it to be a theory of the initial state of language faculty. A theory of the genetically programmed initial stage of the language faculty and a theory of that faculty is said to meet the condition of explanatory adequacy to the extent that it does this. Well, explanatory accuracy is a much more interesting topic and a much harder topic. But the search for descriptive and explanatory adequacy sends you in opposite directions. As you try to get more and more descriptively adequate grammars, they get more and more complicated and intricate and specific to particular languages and particular structures and particular languages and so on and so forth, but you KNOW that’s the wrong answer, because the right answer must be that there’s only one language, they’re all identical, otherwise it would be impossible for anybody to know any of, them cos you just know too much given the data that’s around so you must have known it to start with, so it must be that all of this proliferating complexity is just misleading epiphenomena and if you could only see the truth - you know - you could see that it’s just all the same system with minor modifications.
So the search for explanatory adequacy is leading you towards saying all this stuff doesn’t exist, and the search for descriptive adequacy is leading you to say, look, it’s way more complicated than you thought. And for a long time the main research programmes were directed to trying to resolve this obvious conflict or tension - not contradiction, just tension. And the way it was done - you can imagine many ways - but the way that turned out to be fruitful, was to try to find properties of rule systems which you could abstract away from particular languages and just attribute them to the initial state of the language faculty, and show that when you abstracted these principles and properties away, you got systems that were less complicated than they looked, so all the complicated varying details turned out to be special cases of the interaction of some principles, and so on. Well, that went on from about 1960 up until say 1980, that was the main course of research. Around 1980 a lot of this stuff fell together all of a sudden, and it sort of crystallised into another way of looking at things which had been building up all through these years, into a system that since then people have been calling systems and parameters framework. It’s not theory, it’s a framework. It becomes a theory when you fill out the details. But the principles and parameters framework, which is a very sharp departure from traditional grammar, this is a field that goes back 25 hundred years, in fact a lot of the work that has been done is not all that dramatically different from what Panini was doing 25 hundred years ago. But the 25 hundred year tradition had some common threads through it. One common thread is this, what you know if you studied Spanish or something. When you study a language you have a chapter in the book which is on how to form relative clauses in Spanish, and how to form verb phrases in German, and so forth. And they’re very specific - complicated detailed rules, although they don’t begin to cover the data.
I mean as soon as people start this they immediately found that traditional grammars didn’t even begin to be descriptively adequate; they just ignored everything. Which makes sense - because people know it anyway. [Laughter] You just confuse people if you try to spell it out even if you knew it. But the properties of a grammar, the rules that you find, are specific to particular languages, and even to particular constructions in particular languages. And modern generative grammar took that over.
So if you look at early generative grammars you’ll have rules for forming the passive in English, and other rules for forming the relative clause in Italian, and so on and so forth. The principles and parameters approach says there aren’t any rules and they’re aren’t any constructions, so it’s a very radical break. All that there are, are universal principles, which are part of the initial state of the language faculty, and then it has possibilities of variations, small possibilities of variation, called parameters. So there is something language universal, namely the principles and the possible parameters. There’s something language specific, namely the choices of values for the parameters, and that’s all. Things like the passive in English or the relative clause in Italian are, from this point of view, taxonomic artefacts. Kind of on a par with you know, ‘large mammal’ or ‘household pet’ or something, they’re real, but they have no scientific, they don’t exist in the universe, they’re just kind of the interaction of a lot of things to do with this. So that’s the approach, and it’s a big, change, and it changed everything.
One change is that we now have a way of saying what should have been obvious all along, namely that a state of the language faculty is inevitably be going to be completely different from anything that you might reasonably call a language. A language is going to be defined as a particular choice of values of parameters. So there’s n parameters, maybe they have two values, pick the choice for each one, that’s a language. The state of the language faculty is never going to be known like that, it’s always going to be the result of crazy and uninteresting experience, and in fact even uninteresting history of the languages, and so on and so forth. So we can now distinguish a language, now let’s call it an i-language just to make it clear, that’s a very technical notion, i for internal, individual, intentional in the sense of an intentional characterisation of the generative function. So an i-language is a set of choices of parameters, and it’s distinct from a stage of the language faculty. That’s something else, and something not especially interesting. Furthermore, a goal to try to show that there’s a unique i-language, that is that there’s just one and only one i-language, at least within syntax, within the part of language that’s forming logical form representations, the interface with the systems of language use. This approach puts the question of explanatory adequacy on the research agenda, it doesn’t solve it, but it makes it a formulable question for the first time. Up until this time, you really couldn’t formulate it, so that the most that anybody could dream of was what was called an evaluation procedure that would choose between alternative proposals as to what might be the theory of a language. But there was no way to talk about how you might gain one or the other from data.
But if this approach turns out to work, there is an answer, namely the parameters have to be designed so that the values for them can be determined on the basis of extremely little data. And that, if you can do it, would solve the problem of explanatory adequacy, it would say that the child gets a little bit of data and says OK I’m this type of language, and I’m that type of language, and once you’ve answered all those questions everything works, because the principle’s already in there, so have a language. So it’s possible to pose the problem of explanatory adequacy; it gets on to the research agenda. This immediately led to completely new ways of looking at questions of acquisition, and typology, and sentence processing, and all sorts of other things, and it also raised new internal questions.
The main internal question of course is to find the principles and define the parameters, and the search for that led to quite a real explosion of empirical work in the last ten or fifteen years. I’m sure much more has been learned about language in the last ten or fifteen years that in the whole preceding 25 hundred, with new questions that nobody ever thought of, and lots of new answers, and new theoretical ideas, and also typologically quite diverse; by now, there’s work of this kind going on in a very wide range of typologically different languages which are being looked at in new ways and so on, and also - and here’s the topic I want to talk about, finally! - and also, since there is a conception at least of what an authentic theory might look like for the first time ever, then - namely one in which the question of explanatory adequacy can at least be raised in a serious way - not answered, but raised - given that, you can start asking some harder, more interesting, and more principled questions about the nature of the system. Now there are several of those. Generally speaking, the question is, when you look closely, how much of what you’re attributing to the language faculty is really driven by empirical data.
You can now ask instead of trying to get a patchwork system where you get things to work, look closely and ask how much I am postulating is really necessary, given the empirical data, and how much is there to sort of solve engineering problems? Looking at the same question from another point of view, you’re asking, the language faculty and the i-language that instantiate it, how good a solution are they to a set of general boundary conditions that are imposed by the external systems, the language faculties embedded in other systems, and they put some constraints on what the language must be. Like, it’s gotta be, have a, linear temporal order of speech, that’s an external condition. And the interpretive systems have to find phrases and what are called ?fader relations, semantic relations among the phrases; these are external conditions imposed by the systems on the outside, the interpretive systems, and there are such conditions, and given those general boundary conditions, how perfect a solution is language to satisfying those conditions?
It’s kind of picturesque, but how ‘perfect’ is language, given the output conditions and I’ll call them bare output conditions, to distinguish them from other kinds, output conditions that are often used, that are really parts of the computational system, like filters, and ranking of output constraints, and so on. That’s part of the internal computation at the output, so I want to talk about bare output conditions, those that come from the external systems. And in principle you could learn about those independently, like you could study the articulatory system, or if you knew enough, you could study systems of language use, and you could say well, what are they requiring the language faculty to give you at the interface, and you can ask how perfectly language satisfies the condition that those things impose. Well, that brings us to what I’ve bin calling the minimalist programme...which is an effort to really explore the intuition that language is surprisingly perfect. In a sense that naturally we wanna make precise. And from exploring this intuition, we want to make sure that we are accepting any structure at all only if we can really show that it’s motivated by empirical data. And that’s turned out to be, it’s an interesting programme, I don’t know if it’s right or not, but it’ll, it’s leading into interesting directions. Optimally, you can see what you oughta try to find if language is really perfect. The students, like ?Reeder, will remember that ever since the principles and parameters approach got formulated, every class of mine in the Fall I always started by saying let’s see how perfect language is and we tried to make it perfect and it always turned out to be hopelessly imperfect, as the thing got on. But somehow in the last couple of years it’s again started to fall together and maybe it really is perfect, which would, if true, be extremely interesting because it makes it totally unlike anything in the biological world, as far as we know. I’ll come back to that. Well, optimally, it should be the case that there aren’t any other levels, just the interface levels. Nothing else. Which means no ?d-structure, no deep structure, surface structure, s-structure, none of that stuff. There should be no structural relations, other than those that are forced by the interface conditions. That means no government, no binding theory, internal to the language faculty. That means all the traditional notions, including the ones taken over by generative grammar, have to go. They wouldn’t fit. And other properties of that kind as we go along. Well, OK. Let’s go along a bit. The i-language, that is this,- which now we identify just as a set of parameter choices, the i-language has two components, one is a lexicon, which I will take in the traditional sense to be the repository of exceptions, so the things that aren’t principal, like the fact that the sound ‘tree’ goes with the concept ‘tree’ rather than with some other thing. That’s in the lexicon. And then there’s a computational procedure, and I wanna try to show that that’s unique and invariant at least in the syntax, there’s only one of them. Martians looking at humans would say there’s one language with a bunch of lexical exceptions. The computational system takes some kind of collection, it’s called an array of lexical items, pick em out out somehow, and it carries out a computation in a uniform fashion, and it ends up forming interface representations. So, now, assuming a pair of interface representations pf and lf, what’s the array? Well, it has to have at least some structure, and without going into it, I’ll assume it at least has the structure of what I’ve called an enumeration elsewhere and probably much more. It won’t matter for this, I won’t go into it. But the array has some kind of structure; how much we really don’t know, we’re guessing. The er, we’ll say that the derivation converges at an interface level if it’s interpretable at that level. And that’s a property that’s determined by the external system.
Otherwise it crashes at that level. A derivation converges if it converges at both levels. Separately. That assumes that there’s no interaction between the pf and the lf level, which again is a very strong empirical claim, and there’s a lot of evidence against it. So if it’s true, it’s interesting. But if the language is really perfect then you’d expect a few things to happen independently, so I’ll assume that er making perfect assumptions.
So we assume, and then we say, that the derivation crashes if it crashes at either level. Er Well, in looking at the lexical items, each lexical items is some complex of properties, properties with both features so it’s ? that, so lexical items have a complex of features, and given this much structure you can distinguish three types of features. There are those that are interpreted at the phonetic level, the pf level so-called, category p, they’re accessed at the phonetic interface, you know like aspirated p and that sort of thing. Er there’s those that are accessed at the lf level called semantic, though it’s misleading, so the semantic features are accessed at the lf level. And then there are others, which are just accessed by the computation itself. Call them formal features. So we have three kinds of features, phonetic, semantic, formal, defined this way. Now these sets can overlap in all kind of ways. A further assumption is that the phonetic ones are disjoint from the union of the other two sets. So the phonetic features are not found, they’re a separate set, they’re not in the other two sets, semantic and formal. What about the semantic-formal relation? Well that actually is a traditional question. That’s sort of the question you know whether verbs refer to actions, and nouns are names. And so on and so forth. Notions like noun and verb are formal, they’re accessed by the computation, but notions like action, and thing and so on are not, they’re semantic or whatever the right ones are ?, and the question of how the formal and semantic interact is a version in this system of the old traditional questions of the semantics of grammatical categories and so on. What about the parameters? Where are they? Well, a nice system would say that they’re only among the formal features, that is the phonetic features and semantic features aren’t parameterised. So let’s try that. There are just only the formal features are - the the narrower and more restrictive you can show the parameterised features to be, the easier it is to deal with the problem of explanatory adequacy.
That’s a tough problem, how everyone knows this stuff, with no evidence. So you wanna make sure the right theory oughta have the answer, it’s a very small number of things to learn and a very circumscribed place. Well, one kind of circumscribed place is just the formal features. An even more circumscribed proposal is that the only thing that’s parameterised is the formal features of what are called functional categories. Functional categories are the ones that lack any non-trivial semantics. So not verbs, adjectives and nouns, they’re not functional categories. They have non-trivial semantics. But the others - and you have to explain what you mean by this - but the others with trivial semantics or no semantics are the functional categories. Er, and they have formal features like others, so maybe the parameters are only in the formal features of functional categories. And a still narrower theory, and one which begins to look more reasonable as we go along, is that the only parameters in the syntax at least have to do with one particular property of formal features of functional categories, and that is whether they come out the mouth. Er so are they pronounced or do you just compute them in your head, you know? And languages seem to differ - the sort of intuitive picture is look, there’s one computation going on. Er no matter what language you’re speaking you’re carrying out the same mental computation. But languages differ in how the sensori-motor system accesses it. So some access it at one part, an another accesses it in a different part, and that makes the language LOOK very different, but again, from the Martian point of view they’re essentially identical with a trivial difference and also from a child’s point of view, and that’s the important part.
The languages MUST look identical from the child’s point of view. Otherwise it’s impossible to learn any - that’s the driving empirical fact that’s hanging over your head like a cloud all the time and it makes the subject interesting. So a possibility to be explored would be that the only property that’s parameterised is what’s technically called strength. Are you a formal functional feature which is pronounced, or are you one that’s just computed and unpronounced? OK. So like the idea would be that say in Latin the cases actually get pronounced, but in English you got the same cases but they just aren’t pronounced. So you’re just grinding away in your head. You see the EFFECTS of them, you get the consequences, but you don’t hear them. And similarly with other things in other languages. So the, if you could show this, that would mean that the typological variety of languages reduce pretty much maybe entirely to just the question of the various combinations in the way the unique invariant computational system is accessed.
Well, this is a little too strong, and let’s look at how too strong it is. Remember the driving empirical question is acquisition - how can anyone acquire a language? And the fact is, that in parts of language that are close to the data, that are close to the phenomena, you’d expect variation to be possible. So, you know, different kinds of phonetic variation, you can hear them, so you can get language variation in them. On the semantic side, you might imagine, and maybe it’s true, that things likes semantic fields in the traditional sense are just variable within some range, because a little bit of data might tell you something about how a set of concepts is broken up one way or another in what are traditionally called semantic fields. So you’d expect some variety round the periphery. Phonetics, peripheral semantics, and so on. I’m gonna abstract away from that and just talk about the rest, and when I say that all typological variety is in the, I propose, is in the strength of formal features, the functional categories, I’m abstracting from that stuff. Well, can we make it even narrower?
It looks possible, so let’s ask what formal features can have this property of strength. Well, er it’s only functional categories I’m assuming that it’s only functional features that are parameterised are a feature that says in effect I need a certain category. Like I need a noun phrase, or I need a verb. But not other features, like I need case, or I need number, or something like that. So the strength, possibly, is reducible to the need category property of formal features of functional categories. That would mean to say specifically that t, tense, which is a functional category, may or may not have the feature I need a dp, or noun phrase, basically. And if it has, that’s the feature that’s got to be standard projection principle. If you have it, standard projection principle, you necessarily have it, if you only have a subject, if you don’t have it you don’t. That’s the ?t feature of tense. But you can’t have a case feature. And t might or might not have the property, I need a verb. If it does, you have what are called d-raising languages, if doesn’t, you don’t have d-raising languages. Maybe that’s the only - but no properties of say verbs or nouns, they can’t have strength features, and no other access properties, other than category access. Well that’s then a very narrow class of possible variation of language, and if the system works out, that’s the way it’ll be, languages are all the same, except in one small corner of what comes out the mouth.
From this point of view, the relation to the sensori-motor system is sort of extraneous to language. It’s like a nuisance on the outside, imposed by external systems, so, like, if we could think and communicate by telepathy, let’s say, then you just dump all this stuff, and you just carry out the one unique computational process, that’s sort of the idea. And it would be nice to show that the imperfections of language, of the kinds you might work it out if you were sitting somewhere and you were god or something like that, that those imperfections - not that we want to get too exalted a self-image round here!- [Laughter] but the, try to show that the imperfections, as much as you can, really of result from the sort of extraneous fact that because of the you know ridiculous lack of telepathy we’re forced to turn all of this stuff into a sensori-motor output which has certain properties, cos that’s the way the mouth works, and so on and so forth. That’s a further goal. Well, among the formal features - we’re now concentrating down on those - some of them are what we might call purely formal; that means they’re formal but not semantic. Remember these two categories overlap. So take those that are purely formal. They’re not semantic at all.
That would be things like - that means they get no interpretation at the interface level. Well, an example would be say case for nouns. The case of a noun doesn’t affect its interpretation; it’s interpreted the same way if it’s nominative or accusative let’s say. Verbs also have a kind of a case property, signing property, like some verbs assign case and some don’t. Well, whatever that property is, it’s not interpreted. The semantic correlate to it might be, like transitivity might be, but not the case-assigning property itself. Er on the other hand, what are called the phi-features, the features like number, gender, and person, - number and person at least and sometimes gender depending on the language - those features get interpretations at the interface, like interpret a plural noun differently from a singular noun. So the phi-features of nouns, they get interpreted. On the other hand, the same phi-features in verbs and adjectives don’t get interpreted. So a verb is interpreted the same way whether it’s singular or plural. All right.
So the phi-features of nouns are interpretable, the phi-features of verbs and adjectives aren’t interpretable, nobody’s case features are interpretable. That turns out to be quite a crucial distinction, it’s a principal distinction determined by output conditions, and it has effects if you think about it. This is something that wasn’t really thought about till quite recently. That’s part of the problem that’s bin making things look imperfect for the past ten years or so, that we haven’t noticed that distinction. When we notice it a lot of things fall out. But it’s a clear distinction and a highly principled one.
Interpretable features cannot be erased in the course of a computation. Because they’ve gotta be interpreted at the output. On the other hand, uninterpretable features MUST be erased in the course of a computation, because they have no interpretation at the output so if they survive the output it crashes. Well that tells you right away a lot about the structure of a computational system. It says whatever it’s doing you can’t get rid of any interpretable features like say plural and nouns, and it MUST get rid of case features, like nominative and nouns. OK. And it must get rid of plural in verbs, cos that’s not interpretable. And in fact you can now sort of glimpse what a perfect system would be. It would say that the only operations there are, are the ones that get rid of pure formal features that are uninterpretable. There aren’t any other operations. So the computational system will be restricted to operations which get rid of uninterpretable formal features, and the only well-formed derivation, you know the only computation that gives you a linguistic object, is one that adhered to the principle that it didn’t do anything except some operation that got rid of uninterpretable formal features. Notice that there’s a difference between what’s called structural and inherent case in this respect: inherent case is semantically-related case, case that’s assigned by virtue of a semantic relation, like the genitive case comes out in English with an of phrase, assigned to an adjective, you know you say ‘proud of John’ the relation between ’proud’ and ‘John’ is a semantic relation. And that’s inherent case. And that’s distinct from structural case, which is purely configuration. Like nominative case assigned to whatever’s in the subject position - it may have no semantic relation to anything - accusative and nominative cases are typically structural, other oblique cases are typically purely inherent and they have all sorts of different properties. the inherent cases are interpretable, cos they reflect the semantic relation; the structural ones are not, so it oughta turn out that things with inherent case are invisible to the computational system, cos there’s nothing that they have, their phi-features are interpretable and their case is interpretable. We really shouldn’t call it ‘case’; it’s just called case cos it’s kind of similar morphology. But it’s functioning in a completely different fashion, functioning as a reflection of the semantic relation, the other isn’t.
And in fact it should follow then that the computational system is only looking at things like, it only can see, things like structural case, phi-features of nouns and adjectives, strength of features which is not interpretable, and things like that. Those of you in the field will recognise that this is the core idea of John ?Vernieux’s case theory which set off some of this work years ago. Well it should follow then that all movement operations, you know transformations, all movement operations. should be related to, they should apply just in the case that they are contributing to the erasure of pure formal features. Checking erasures. It’s more complicated than this, but something like that. And it also should turn out that parametric variation should have to do only with the strength of formal features, cos the others you’ve got to get rid of anyway, and the only variation should be which ones you get rid of in such a way that it effects the phonetic alphabet. OK. And which ones you just get rid of in your head and it doesn’t effect the phonetic alphabet. Well, er let’s suppose that that’s true. If that’s true, it’s a nice elegant system and it’s kind of perfect. What does it mean to say that a formal feature is ‘strong’? Incidentally heres where I’m starting to correct stuff in the in-press version of the final paper I’ve been talking about, ‘cos it’s got it wrong. But if you think about it, the notion of a feature being strong or weak is sort of bin mysterious, how can a feature have a further property, how can it have another feature, like I’m strong or weak? Well, if you think about it, it doesn’t have another property. To say that a feature is strong is just to say that it’s there. If it’s there it’s strong. If it’s not there, it’s not strong. So the d-feature of tense, the feature that underlines the extended projection principle, you know that says some languages have subject verb objet, and others verb subject object and so on; that property is just the strong feature of tense which is now that feature I need to ?dp. That feature is either there or not there. if it is there we’ll call it strong. If it isn’t there, so, we don’t call it anything, it isn’t there. So there’s no feature of strength over and above the other features, it’s just a way of referring to the features that are there in this sub-category of parametric variation which reflects what happens to come out the mouth. If tense has a d-feature, you have a SVO language or an SOV-language, if tense lacks a d-feature you have a VSO language, if tense has a d-feature you have a verb raising language, French type, if it lacks a d-feature you have non verb-raising language, you know English, Scandinavian type, but that just means the verb remains in situ, it doesn’t have to go anywhere, unless it does for some other reason, like if the verb has to get all the way up to complementizer ? to raise, but that’ll be for other reasons. It also follows that in languages like English the tense-verb relationship should be actually of a kind that was proposed in the 1950s, of a kind that came to be called ?affect stop, with just some boring phonetic property, that irrelevant sensori-motor component, which is relating the feature to the verb because features can’t just hang around freely and still be pronounced by the mouth. They gotta be attached to something. So it’s kinda ?lowering, ?Plasmic and others have been pursuing that framework. So the term strength is just in the mind, not to be taken seriously as in this coming-out chapter, and it just means the need category feature of a functional category is there. Period. Now the, this distinction between interpretable and not, happens to have a big range of empirical consequences there’s a lot about that in the stuff that’s coming out, so I won’t talk about it. And in a perfect theory, in an ideal theory, movement should be restricted then to a very narrow convergence conditions related to uninterpretability of pure formal features, it should be down to that, and that’s what a kid looks at when it’s learning language, the kids you think are so dumb, they’re looking for the uninterpretable strength features of functional categories; that’s what they’re looking at, according to this story. [Laughter] Now the general picture then is this initial ray, with whatever structure it has, is going to, is generating, you know deriving, this lf representation and I’m only looking at that side, by the operations that are driven, forced in fact, by the bare output conditions. Well, one of these operations is the one that’s called spell-out, the pf and the lf representations are distinct, in fact probably disjoint in their properties, not just distinct but actually disjoint, so gotta be somewhere along the derivation it’s got to split on two paths, one gives you the sensori-motor side, and the other just goes merrily on its way with the syntax. Now, general assumption is, the simplest assumption - we’ll keep to it unless we’re forced otherwise - is that any operation can apply anywhere. OK. So the operation spell out can apply anywhere, and what it does is remove the p features, it takes them away and everything else keeps going, and the p features and whatever else it takes away, they just go off into the what’s called the phonological component and meanwhile the array to lf derivation just keeps going, now deprived of its p features, but otherwise going without change. If something in the numeration, in the initial array doesn’t get used, by the end, well it just isn’t a derivation. It’s like a proof that ?miss is missing a step, or something. It isn’t anything, so we throw it out. If it happens to end up with something which includes an uninterpretable feature, it crashes, so it still isn’t a real derivation. Well, the - let’s proceed. The further principle you might wanna have and might wanna see in a perfect theory, I’m now talking about the principal part, the array to lf syntax - a perfect theory oughta have the property which I might call uniformity which is that no operation is restricted to one or other part of the computation. Now there are basically two parts - there’s the part before spell-out, and the part after spell-out. Let’s call them overt and covert for the obvious reason. So there shouldn’t be any principle saying that some operation can only apply say in the overt part, or only in the covert part. And if you meet that condition, let’s call that uniformity, another condition you might wanna meet is what you might call exclusiveness, and that would say that nothing enters into the computation beyond the initial lexical features from which it began. So that would mean that the whole computation down to lf is just a rearrangement of lexical features. In that case we’ll say the condition of inclusiveness, the empirical meaning of that is you can’t have any bar levels, or indices, or any of that kind of stuff that all that has to go, because none of that is in the initial lexical, it’s not in the lexical entry. That throws out an awful lot of technology so it means everything based on that technology’s gotta be wrong and the problem is to show it. So a really perfect theory would meet these two conditions. Let’s assume it does.
Furthermore the derivations have to meet a kind of economy condition, an optimality condition, which says that a derivation gets interpreted, a computation gets interpreted, only if it is the most economical convergent derivation, and to define that properly turns out to be quite intricate and important [TAPE PROBLEM here... bit of a gap..] at once introduces rather serious questions of computational complexity. The reason is, you’re comparing derivations, to find out, to decide, whether you’re hearing something, you want to decide whether it’s interpretable, you have to be able to compare derivations. Any of you who know anything about automata theory and that kinda stuff will know this can lead to WILD computational complexity problems. And it would be nice, in fact necessary, to cut - to show that they don’t arise. Or rather, more precisely, to show they just arise in the class of cases which are unintelligible. Now we know that an awful lot of well-formed language is totally unintelligible. So you only use scattered parts of the language, because the rest is just not intelligible. .. short, and simple, and well-formed and so on, but now there’s a problem, and a very interesting problem, which is just kind of lurking around the horizon, you can formulate it, you can’t really solve it, and that would be to try to show that those scattered parts of language which are usable are in fact those parts in which problems of computational complexity don’t arise, OK. That’s a really hard problem, and it’s interesting, problem of the nature of the hard physical sciences, tough problem, therefore an interesting one; I don’t know how you could answer it but you could think how to approach it.
So that problem’s kind of on the horizon somewhere. So how do you approach it? First by looking closely at the economy conditions and trying to see to what extent you can show that they DON’T introduce computational complexity problems. And there’s a lot of natural ways to go. So for example to cut down, cutting down computational complexity, means shaving away the number of things you have to look at when you decide whether some derivation is correct. So make sure you don’t kind of get exponential blow-up at each point. Well, one step towards it is to suppose that at every step of the derivation you don’t ask about all the derivations that are possible, you just ask about the most economical next step. So what’s the most economical step that can be taken NOW that’ll lead to a convergent derivation? So that entails that as you’re moving through the computation the class of things you have to look at is narrowing all the way along, you know, and gradually gets quite small. Still a big class, but it narrows. Another - let’s assume that that’s the way economy conditions work - actually all of this stuff has empirical consequences, every such proposal has a lot of empirical consequences, so you have to check ‘em out.
[Tape Turned Over About Here]
Another question has to do with what everybody assumes to exist somehow, locality conditions of various kinds, in fact ? a book called Locality Conditions. But the er so the big problem is find the locality conditions. Well, one kind of locality condition is to say that movement should take place, be as short as possible, minimal chain link condition it’s sometimes called, minimum link condition. Now, what kind of a condition is this? Well, in the best possible world, this would be an inviolable - this is a very hard thing to figure out. I mean computationally it’s extremely hard to know what’s the shortest movement. For one thing, you have to compare all sorts of operations. For another, it introduces conceptual problems which are kind of unformulable, like how do you compare shortening a derivation in one part with lengthening it in another, you know how do you, which is the shortest? There’s no meaning to that. Well, the best way through this whole mess would be to say the question can’t arise, that there only are shortest possible movements, so it’s an inherent property of rules that they MUST be the shortest possible, and if you violate it you’re just not doing anything, it’s like you know playing chess and making an inappropriate move or something, or trying to prove a theorem and doing something that isn’t a ?rule of ?entrance. There’s no question of as to whether it’s good or bad or short or long, it just doesn’t exist. So the only derivations are those that satisfy the link condition. That would be a nice property, and the empirical consequences turn out to be pretty reasonable I think; again, there’s a lot about this in this stuff that’s coming out. Well, let’s assume that’s right. Another proposal would be to show that operations take place only if they’re forced by unchecked, so far unerased, pure formal features. Only in that case can the operation take place. That’s cut down, things of that kind cut down the class of computations that have to be inspected, quite radically, still leaves it you know, too big, but at least these are the kinds of steps that can be taken first, towards eliminating computational complexity. The problem at each point is to show that the empirical consequences of such a proposal, which are usually very extensive, that they’re right. If they’re wrong, too bad. But if they’re right, you have an idea, you think you’re on the right track, you’re on the way to cut down computational complexity. Well, this stuff is discussed in this mystical fourth chapter!
So, what are operations? Well, the bare output conditions suffice to tell you that there are at least two, three in fact. One of these operations has to be spell-out, which I’ve already mentioned and that’s because there are at least two interface conditions which are separate. And the second operation that’s forced is, let’s call it merge; take two things you’ve formed already and make a third thing. That amounts to saying er a sentence isn’t just a set of lexical items, it’s some kind of structure formed from them, which it obviously is. So, bare output conditions force you to say that when you have constructed linguistic objects, you’ve constructed a bigger one from tow of them and we call that merge, and you try to make them as simple asp possible, and notice that merge doesn’t carry any cost, it’s free. And the reason is if you have an array of items and if you don’t apply the operation merge often enough you’re gonna end with some of the items unused. And therefore it crashes. And so merge is free, it doesn’t have any cost. When you’re counting the economy, you count the number of times you’ve done merge; it comes for nothing. Er the last operation that seems to be forced, and this just looks like a property of natural language, quite different from invented symbolic systems at this point, is the operation call it move, and that expresses an irreducible fact about natural language, which is captured in one or another way in every theory, people don’t like to say it, and that is that things are interpreted in positions that are displaced from, er they APPEAR in positions that are displaced from when they are interpreted. And that’s just a fact, you know. Look around language, you take the pieces of an expression and you see they are interpreted somewhere else. That’s an irreducible fact about natural language, the simplest expression of that fact is to say there are objects, called chains, which simply express the relationship between the position and the point of interpretation, and transformational grammar’s one way of working that out. Other notations sometimes claim to be different, but if you tease them out they’re the same, because there’s no getting around this irreducible fact. So we need an operation that relates those positions - call the operation ‘move’ for reasons to do with its nature.
Well, at this point we can return to the question of strength. And I’m going to make some comments. I’m going beyond the unpublished paper. I’m, you’re always gonna try to merge at the root. When you’re building things up, to embed things by merger you can show is a much more complex operation than just to tack it on to what you’ve already formed. Good technical reasons for this which you’ll know if you look into the system. So therefore you will always merge at the root if possible, and in a perfect system you’ll only merge at the root, because you’re trying to make everything perfect, you know. So let’s assume that merger is always at the root. It follows that strong features can only be introduced at the root; they can never be embedded, OK. So strong features will only appear right at the top you know, if you think about it graphically, at the top of the tree. Or the bottom of the tree, depending on how you look at it. But that’s all metaphor, because there aren’t any trees, we’ve given up bar levels, all that stuff’s gone, these are just graphic notations.
The merger will always be at the root, strength will always be introduced at the root. Furthermore there’s another economy principle which will be pretty interesting in its consequences and that is in the initial array - remember, there’s only a certain class of things you have choice about, namely the parameterisable strength features, OK, so now we’re just restricting ourselves to that - and the principle says that one of those things can be in the initial array only if it has an effect. Only if it has an effect, either at pf or lf. If it has no effect either at pf or lf, it can’t be there. OK. That economy principle which turns out to have quite interesting consequences when you pursue it, let’s assume it’s true, er that means one of these optional things can only be there if it’s gonna show up somewhere at the output. It follows from that that er the strength features - well; you can now formulate the following proposal. It doesn’t yet follow. The formal proposal would be, feels like theorem hanging around somewhere, that er strength features can only be introduced overtly. Reason? If they’re introduced covertly they’re obviously not gonna have a pf effect, and they don’t have an lf effect. OK. Now, if you can show that, they plainly don’t have a pf effect because they’ve been introduced after the split, so they will only be able to be there at the beginning if they have an lf effect. Well, it seems to turn out that the only one that has an lf effect is what drives qr.
There’s an interesting paper by Danny Fox on this, and other work by ?Tania ?Reinhart and others, which has been put in a different framework, but what it comes down to saying is, you can carry out quantifier raising, you know, putting a quantifier somewhere where you wouldn’t expect its scope to be, only if that operation gives you an interpretation you otherwise wouldn’t have. Well - and this is always a covert operation. Well, from this point of view it means that er you could only have the strength feature that says ‘move the quantifier to me’ if it has an lf effect. Incidentally, the consequence is that in languages that have overt counterparts to quantifier raising, this shouldn’t happen, you should be able to do things freely; that’s Hungarian apparently according to ? at least, who says the qr effects and so on you get in ellipsis you don’t get in Hungarian (she claims). Tell me if it’s right! And if it’s right it would be kind of nice, because that would mean since that is having a pf effect, since you’re overtly moving it, doesn’t matter, you can do it even when you don’t have an lf effect, because it’s showing up somewhere. On the other hand in English, when you look at ellipsis constructions and so on, you can see you don’t get these effects.
Well, you know, if it turns out it’s nice. Well, there’s a potential theorem hanging around, which says, strength must be erased overtly, because otherwise it couldn’t be there at all, except in the cases where you have things like quantifier raising, where it does have an lf effect. But verb raising doesn’t have an lf effect. So that’s gotta be covert. Well. What about xp raising? You know, maximal projection raising..? Like in getting the subject up there. Well that has to be done not only overtly, but it has to be done fast. Because it has to be done before you build up a structure that’s going past the checking domain of the ?head with the strength feature. I’m sorry this is going to get pretty technical here, there’s no other way to do it. So when you get a checking domain of a strength feature if you get beyond the checking domain it’s too late, you’re not allowed to move inside, you know, nowhere to check unless you get rid of this thing before you get that high, so that’s going to cause er when you have sp movement, sp raising, you’re gonna have to get rid of it not only overtly but also fast, and a consequence of that is basically minimality. It sort of falls out of that.
... theta theory.. movement theory.. adjective and its complement.. phi features of adjectives.. semantic relation.. structural relation.. checking features.. you’d never move phrases.. madly pursuing the intuition.. we see phrases moving; but that could be a mirage.. to satisfy convergence conditions of pf.. looks as if the things are moving but they really aren’t.. covert movement where it doesn’t have to come out the mouth.. another theorem.. if true would say that overt movement takes the minimum phrase that’s required in order to satisfy pf convergence. .. turns out to be much more natural.. to drop the notion of movement.. go back to an older notion that was hanging around, say it’s really attraction; it’s not that something’s moving to target something else, it’s that something’s attracting somebody to get rid of one of its problems.. category has a strength feature.. look at the closest things, because of the minimum length condition .. and that should be all of movement theory.. look for the feature that’s gonna do the job.. ideally that oughta be all there is.. very elegant picture if it’s true.. vast empirical problems that arise.. let’s go on to specific matters.. functional categories.. playing a very central role.. what are they? .. we have evidence for the existence of some of them.. tense has semantics and it has phonetics.. similarly there’s evidence for complementizers.. similar evidence for d, the determinant feature. noun phrases
.. evidence for a sort of light verb.. just thin semantics.. theta theory.. lexical shells.. decomposition theory.. transitivity is from this theta theoretic point of view a light verb followed by a ? .. that would make transitives kind of like causatives.
[writes on blackboard] .. so that’ll be what a clause looks like. .. no space for the agreement complex.. motivated basically by the fact that it provides structural positions.. interesting proposal.. seven or eight years ago.. separation of various properties of ?infection.. the main thing.. bare phrase level approach.. drop all x bar theory.. could be any number of specifiers.. it’s beginning to look like they’re right.. seems to fill a gap.. structural position of ?agra.. lemme stop with this.. we’re gonna have parametric variation.. as to how many specifiers.. extremely interesting.. let’s take tense.. if tense allows no specifiers, you get a VSO language; if it allows one specifier, you have a SVO language; suppose it allows multiple specifiers, well then you get what’s called transitive expletives, multiple specifier languages, except these break up in interesting ways too, like Icelandic is the case that’s been studied most in depth, mainly because husky ?training ?center’s at Harvard so we can all ask him questions but by now there’s been a lot of study of this.. it has double subjects, and that’s essentially two specifiers.. tense.. the two have very narrow conditions on them.. if the first is an expletive and the second’s an argument, you can’t have two arguments.. the same thing sort of happens in German.. what about the possibility of infinitely many of these things.. arbitrarily many.. with all arguments outside.. the only thing that’s left on the inside are what is sometimes called agreement elements.. you can everything, you can do the minimum amount, and you can do nothing. .. That seems to give the right sort of typology. ... here is the agent.. one specifier of small v is given by theta theory.. parameter that says I’m allowed to have a specifier.. the analogue to Icelandic having a transitive expletive.. PhD thesis.. the subject is higher than the object. The whole literature’s based on that. It turns out to be false. It’s because people were looking at the wrong examples. If you look at the right examples, it turns out it’s the other way round and nobody’s noticed it before. So in actual fact the object’s is always first and the subject is always second. So you get sentences which would be like in English er ‘there read these books never any student’. OK, that’s the way it comes out. Nothing remains in the verb phrase everything is moved, but it’s there and you know the object’s in the verb phrase.. and then there’s an expletive in front of the verb. .. tells you the object must be able to cross the subject.. distance.. is measured by some property of minimal domains. .
.. we know the subject is moving.. how do we know that? .. this agrees with the verb.. these two must be equidistant.. forced to get these results.. these are the only things which will converge.. strong evidence that expletives converge.. .. the reason for that is that merge is free and move costs.. you always do merges if it will converge.. the facts ought to be the opposite of the way they always assumed to be.. the facts were misunderstood.. which is the kind of thing that makes you think that maybe you’re on the right track. .. this is all stuff that should have been in there.. but that doesn’t mean anything.. in a couple of months it’ll all be changed again.. If it works out, it will turn out.. pursue these intuitions.. then you’ll have strong reasons to believe that language is a kind of a biologically impossible object.. something like inorganic chemistry.. organic world where everything is messy and so on.. may be the most interesting thing about human language.. it just seems very different from biological objects.. [perhaps] all biological objects are like this, we just don’t know how to look at them.. maybe that’s also true of the biological world.. if so maybe all of biology might look like this. .. OK.”
-QUESTIONS [Chomsky’s reply mostly on ‘indexing’ & .. previous work.. it’s all wrong. Japanese has scrambling.. quite interesting work on this.. new paper.. usual unsolved problems.. look contradictory.,. and that’s what makes it look interesting..]
[.. successive cyclic w.h. movement.. reflex.. all languages really like.. Irish.. x.p. conjunction.. look at the history of transformational grammar.. last 40 years.. dramatic evidence.. extra position, heavy NP shift, ?dp fronting.. were always called stylistic rules.. some intuition they were something to do with style, not grammar.. interleave all over the place.. displacement.. aren’t part of the same system of language.. sign is taken care of.. the question of truth and falsehood.. entailment has the same kind of entailment as rhyme.. entailment relations.. even if the semantics of lexical items is not complete.. two extremes.. each concept is an atom.. the other is they all kind of decompose into each other, .. kill dissolves into die and so on.. they both look wrong.. if things turn out to be paradoxes..]
-CHAIRMAN stops questions as he’d promised to deliver Noam 1/4 hour before.
Back to Main Index of Rae West's Site