Unsatisfying Encounters with Academics of Renown, Episode 2: Noam Chomsky

I first heard of Noam Chomsky in my Linguistics 101 class in college. I remember the professor telling us that the history of linguistics before 1955 was known as “B.C.” — Before Chomsky. That’s how much he revolutionized the field.

If you happen to already be familiar with Chomsky’s ideas, you can skip ahead to the next paragraph. If not, read on, because the rest of this story won’t make any sense otherwise. Chomsky’s Big Idea is something called “Universal Grammar.” We humans have circuitry in our brains that has evolved specifically to support language. Children learn to speak whatever language they’re exposed to because the basic structure of human language — the Universal Grammar — is innate. Learning a specific language is just a matter of plugging a vocabulary of words and a few arbitrary rules of syntax into our built-in language processors.

So what are the rules of this Universal Grammar? Well, Chomsky doesn’t really explain what the Universal Grammar is. He just says there is one.

I hope your eyes haven’t glazed over yet. See, I love this stuff. I’m just completely fascinated with how minds work. How we perceive. How we think. How we communicate. What it means to be conscious. What makes us human, and how we got this way. That’s why I majored in psychology, and why I took linguistics. It’s why I read books like The Ape That Spoke and The Language Instinct. It’s why I sought out John Lilly’s dolphin communication project in 1982. And it’s why, when Noam Chomsky gave a lecture about language during a visit to the University of Minnesota, it was a Must See event for me.

Chomsky came to the U of M some time in the early 1990’s. I wish I could remember when, exactly. I think the main reason for his visit was to speak about politics. His talk on language was just an “as long as I’m here” kind of thing for him, but it was the main attraction for me.

He gave his talk in a medium size lecture room. There were maybe 50 people in attendance. As I recall, it was a somewhat rambling, off-the-cuff talk.

Somewhere in the middle of it he said, “The biggest mystery of language is its purpose. What’s it for? It’s certainly not for communication. You don’t need it to communicate the most important things. You don’t need language to say ‘I love you’.”

I’m still perplexed by what he meant by that. Yeah, yeah, you don’t need language to say “I love you.” But you do need language to say “You five guys climb these trees and wait for the buffalo, and the rest of us will sneak around to the other side and jump out and scare it so it runs toward you, and then you guys hit it with your spears.” It seems pretty obvious to me that language becomes fabulously useful as cooperation among individuals gets more intricate and social structures grow more complex. So was Chomsky asking some really deep question that I utterly failed to fathom, or was he just poking us with a stick to see if we were paying attention? I don’t know.

Eventually it was time for questions. I put up my hand and asked the question I’d had in mind for many years: “Do you have any thoughts about how a nonhuman language might be structured? How might it be different from a human language?”

Chomsky replied, “There’s no evidence that dolphins or whales have anything that could be called a language,” and went on in that vein for a bit.

I was flummoxed. I hadn’t said anything about dolphins or whales. But I realized that he probably got asked about dolphin language pretty often by starry eyed students. He must have assumed I was one of those, and gave his stock answer. Whatever. I made another attempt. “I wasn’t thinking about dolphins…” I started. But before I could get any further, he interrupted and was off and running with a reply about how we’ve got no evidence of intelligent alien life elsewhere in the universe, and so on and so forth.

Well, dammit, that’s not what I meant, either. But my allotment of 15 seconds of Noam Chomsky’s attention was up and he was on to other questioners.

What I was trying to get at was this:

Let’s assume Chomsky is correct about humans having an innate universal grammar that underlies all human languages. This universal grammar evolved in the specific environment that humans evolved in. It was adapted to our peculiar needs. So let’s imagine some other intelligent species evolving an equally complex means of communicating with each other. Whether we’re imagining dolphins on earth or zingblorts on the planet Alfalfa, I don’t care. The point is:

Is the human “universal grammar” the only possible universal grammar? Would any species that evolves language develop the same universal grammar that humans have? Is it even possible for us to imagine what a nonhuman language might be like? Or is it sort of like asking what radio waves “look” like, given that our senses aren’t equipped to see them? Are we so stuck inside our own hardwiring that we could never hope to even recognize that a nonhuman language is a language, let alone comprehend it?

And what could we learn about ourselves by trying to imagine something so very different from ourselves?

But that would have been a very long-winded question.

I still wonder about these things.

Postscript: Many years after my encounter with Chomsky, I read another book by Steven Pinker, The Stuff of Thought. It comes closer answering my question than anything else I’ve seen. But Pinker has a different angle: what does the structure of our languages tell us about how our brains work? He postulates that language reflects our brain structure. The way our minds organize our perceptions of the world into things and events is revealed by the nouns, verbs, and grammatical constructs we use to describe their relationships. It’s a very interesting exploration, even though it doesn’t quite get to the heart of my question.

DRW,

Here’s something which I think does relate to your question:

When I was working at RCA’s Sarnoff Labs in Princeton, one of the researchers from the original DECtalker neural research project presented a lecture. The “DECtalker” was an early voder peripheral made by DEC, and they had used it in a groundbreaking study of neural net programming. If memory serves, the pertinent neural net organisation (“architecture”) used a back-propagation algorithm to entrain 5 hidden layers of neurons. The training sequence was created by recording a few minutes of conversation between 2 second grade girls talking on the playground at school. The audio recording was transcribed to text, and the transcription was subsequently translated into a sequence of ASCII DECtalker commands (“phonemes”). The neural net was entrained by presenting the characters of the text sequentially simultaneous with presenting the target DECtalker ASCII representation of the corresponding phoneme desired in the translation. Again, if memory serves, the original transcript amounted to around 20 pages of double-spaced text. (It didn’t make any more sense than you would expect from 7-year olds.)

As evidenced by recordings produced from arbitrary text presented to the trained net, even with only about 5 to 7 passes through the training sequence, the neural net seemed to do a pretty good job of translation. At the time, I found this extremely impressive, because only a handful of years prior, I had watched from a distance as Bruce Sherwood, one of the most brilliant people I have ever met, spent many months writing a procedural text-to-speech translator for the Votrax. The DECtalker seemed to perform about as well as Bruce’s program; being familiar with some of the pitfalls, when given the chance to test the DECtalker neural net text-to-speech translater in real-time near the end of the lecture, I couldn’t even trip it up with “night” and “ought” and so on. (It had taken months after the initial completion of the original program for Bruce to populate a table containing these kinds of “exceptions” in his program.)

Finally we have generated the context required to present the question and answer which I think are pertinent to your question: In the Q/A after the lecture, one of my fellow RCA researchers asked if they had any idea how the neural net’s trained configuration might compare to human neurons doing the same task. Would all neural nets faced with the same task converge to essentially the same final configuration?

The researcher (and, like your date, I wish I could recall his name) responded with something I thought was of fundamental importance, and that I will never forget. Although this had never made it into any of their publications, he said that he and some of his fellow grad students had had a similar curiosity. The initial “unprogrammed” neural nets were always initially configured with heterogeneous random weights, because if any in-to-out paths of the weights were homogeneous, the back-prop algorithm would adjust them equally. The resulting configurations (paths) would be redundant, and therefore waste resources, or not work at all. E.g. if all weights began the same as each other, after entraining, they may have all changed, but they would still all be the same as each other. Beginning from an initial ‘unique matrix of random neuron connection weights’ always produced a final ‘uniquely entrained resulting network’. Even though, for all intents and purposes, after sufficient passes through the training sequence had been applied, from a black-box perspective, all resulting networks behaved in essentially the same manner.

One of the most fundamental strategies employed by humans faced with learning to read out loud (translate text-to-speech) — perhaps the most fundamental concept taught — is to differentiate between vowels and consonants. The grad students’ question was, did the DECtalker derive a neuron (or 2) which embodied the concept of “consonantness” vs. “vowelness”? The answer was “about 80% of the time.”

In playing with this configuration, and entraining it over and over again, most of the time they could identify a particular neuron which corollated with the letter being presented to the net being either a consonant or a vowel. But the other 20% of the time, there was no grouping, differentiation, recognition, or partitioning of the net which corresponded to the character it was working on being either a vowel or a consonant. In those cases, the neural net seemed to perform its task just as well, but whatever it ended up doing, internally, to produce the “correct” result derived from classifications (“concepts”) completely foreign to humans’ approach.

Although the domain of this study was narrowed to approximately the scope of individual Roman characters, and your question was about the much larger space of language in general, I believe that this provides an existence theorem: If even the same restricted-scope language task can be performed in incomprehensibly different ways, certainly when conemplating the larger encompassing domain described by the phase space of all inter-language translations (inter-comprehension) of languages in general the possibility exists that it will contain incomprehensible cases. The relatively larger size of the phase space describing all languages implies that the probability of language-to-language translations containing such incomprehensible cases is at least a large as that for smaller encompassed domains (such as character-to-phoneme.)

So if you tried to learn to talk to a zingblort from Tralfamador, let alone a being composed of light-in-propagation from the Pleiades, they might not have a concept which differentiates between verbs and nouns — i.e. between that which changes, and that which remains unchanged.

Closer to home, a fellow researcher once pointed out to me that the language of Physics was the language of differential equations. It is the current practice of physicists that, at the fundamental level, virtually all physical laws are described by differential equations. That, he asserted, is because physics is the study of things that change, the way that they change, and connections to anything else that changes at the same time. From this POV, at the fundamental level, the language of physics is a language consisting only of verbs. How do you study something in physics which NEVER changes? (How can you even describe such a thing?)

So my final question is, can you understand physicists, and what does that tell you about yourself?

Sorry to get so carried away, but it is an interesting question.

Sherwin

Comments

Unsatisfying Encounters with Academics of Renown, Episode 2: Noam Chomsky — 5 Comments

Erik Riese on December 3, 2012 at 6:40 PM said:

Chomski has done more to retard the field linguistics than any other educator. It’s really sad how he continues toactively ignore evidence that his theory is wrong. Thanks for this remembrance David. We should get together someday to chat about brain science. I’ve been studying it for about a decade now.
Sherwin Gooch on December 4, 2012 at 9:01 AM said:

David,

What a great idea: To collect remembrances of exceptionally bad (or good) meetings with Academics (or even celebrities) of renown. I have had a few such experiences. I don’t, at the moment, have sufficient time to expand them into the kind of detailed story you have written here, but Richard Hamming and Ted Hoff stand out as deviating the farthest below my expectations, with Steve Wozniak, Robert Jones, Joseph Weizenbaum, Bob Hope, and Colonel Tom Parker following not far behind.

On the plus side, P.A.M. Dirac, Seymour Cray, Paul Baran, Buzz Aldrin, Steve Jobs, Heinz von Foerster, Stewart Brand, Keith Henson, Heinrich Bohr, Theo Gray, Wolfgang Haken, Ted Nelson, Richard Dawkins, Ted Turner, Ricky Nelson, The Ramones, Todd Rundgren, Joe Walsh, Andy Johns (of the Glimmer Twins), Joe Esposito (“Elvis’s best friend”), and Kyle Gass all exceeded my expectations, usually both as gentlemen and scholars (for those to whom it applies, and, at least, constructive workers who weren’t afraid to get their hands dirty, for those who wouldn’t appreciate being labeled “scholars.”)

Sherwin
Sherwin Gooch on December 4, 2012 at 10:47 AM said:

DRW,

Here’s something which I think does relate to your question:

When I was working at RCA’s Sarnoff Labs in Princeton, one of the researchers from the original DECtalker neural research project presented a lecture. The “DECtalker” was an early voder peripheral made by DEC, and they had used it in a groundbreaking study of neural net programming. If memory serves, the pertinent neural net organisation (“architecture”) used a back-propagation algorithm to entrain 5 hidden layers of neurons. The training sequence was created by recording a few minutes of conversation between 2 second grade girls talking on the playground at school. The audio recording was transcribed to text, and the transcription was subsequently translated into a sequence of ASCII DECtalker commands (“phonemes”). The neural net was entrained by presenting the characters of the text sequentially simultaneous with presenting the target DECtalker ASCII representation of the corresponding phoneme desired in the translation. Again, if memory serves, the original transcript amounted to around 20 pages of double-spaced text. (It didn’t make any more sense than you would expect from 7-year olds.)

As evidenced by recordings produced from arbitrary text presented to the trained net, even with only about 5 to 7 passes through the training sequence, the neural net seemed to do a pretty good job of translation. At the time, I found this extremely impressive, because only a handful of years prior, I had watched from a distance as Bruce Sherwood, one of the most brilliant people I have ever met, spent many months writing a procedural text-to-speech translator for the Votrax. The DECtalker seemed to perform about as well as Bruce’s program; being familiar with some of the pitfalls, when given the chance to test the DECtalker neural net text-to-speech translater in real-time near the end of the lecture, I couldn’t even trip it up with “night” and “ought” and so on. (It had taken months after the initial completion of the original program for Bruce to populate a table containing these kinds of “exceptions” in his program.)

Finally we have generated the context required to present the question and answer which I think are pertinent to your question: In the Q/A after the lecture, one of my fellow RCA researchers asked if they had any idea how the neural net’s trained configuration might compare to human neurons doing the same task. Would all neural nets faced with the same task converge to essentially the same final configuration?

The researcher (and, like your date, I wish I could recall his name) responded with something I thought was of fundamental importance, and that I will never forget. Although this had never made it into any of their publications, he said that he and some of his fellow grad students had had a similar curiosity. The initial “unprogrammed” neural nets were always initially configured with heterogeneous random weights, because if any in-to-out paths of the weights were homogeneous, the back-prop algorithm would adjust them equally. The resulting configurations (paths) would be redundant, and therefore waste resources, or not work at all. E.g. if all weights began the same as each other, after entraining, they may have all changed, but they would still all be the same as each other. Beginning from an initial ‘unique matrix of random neuron connection weights’ always produced a final ‘uniquely entrained resulting network’. Even though, for all intents and purposes, after sufficient passes through the training sequence had been applied, from a black-box perspective, all resulting networks behaved in essentially the same manner.

One of the most fundamental strategies employed by humans faced with learning to read out loud (translate text-to-speech) — perhaps the most fundamental concept taught — is to differentiate between vowels and consonants. The grad students’ question was, did the DECtalker derive a neuron (or 2) which embodied the concept of “consonantness” vs. “vowelness”? The answer was “about 80% of the time.”

In playing with this configuration, and entraining it over and over again, most of the time they could identify a particular neuron which corollated with the letter being presented to the net being either a consonant or a vowel. But the other 20% of the time, there was no grouping, differentiation, recognition, or partitioning of the net which corresponded to the character it was working on being either a vowel or a consonant. In those cases, the neural net seemed to perform its task just as well, but whatever it ended up doing, internally, to produce the “correct” result derived from classifications (“concepts”) completely foreign to humans’ approach.

Although the domain of this study was narrowed to approximately the scope of individual Roman characters, and your question was about the much larger space of language in general, I believe that this provides an existence theorem: If even the same restricted-scope language task can be performed in incomprehensibly different ways, certainly when conemplating the larger encompassing domain described by the phase space of all inter-language translations (inter-comprehension) of languages in general the possibility exists that it will contain incomprehensible cases. The relatively larger size of the phase space describing all languages implies that the probability of language-to-language translations containing such incomprehensible cases is at least a large as that for smaller encompassed domains (such as character-to-phoneme.)

So if you tried to learn to talk to a zingblort from Tralfamador, let alone a being composed of light-in-propagation from the Pleiades, they might not have a concept which differentiates between verbs and nouns — i.e. between that which changes, and that which remains unchanged.

Closer to home, a fellow researcher once pointed out to me that the language of Physics was the language of differential equations. It is the current practice of physicists that, at the fundamental level, virtually all physical laws are described by differential equations. That, he asserted, is because physics is the study of things that change, the way that they change, and connections to anything else that changes at the same time. From this POV, at the fundamental level, the language of physics is a language consisting only of verbs. How do you study something in physics which NEVER changes? (How can you even describe such a thing?)

So my final question is, can you understand physicists, and what does that tell you about yourself?

Sorry to get so carried away, but it is an interesting question.

Sherwin
drwool on December 4, 2012 at 11:59 AM said:

Fascinating story, Sherwin.

To answer YOUR final question: I can understand physicists only up to a point. And that’s probably because, having majored in psychology, I didn’t have to learn differential equations.
Endel on December 4, 2012 at 2:50 PM said:

David – I enjoy pondering the questions you pose and they way you have posed them. The closing question in particular is something I have approached through my own lens and language. Considering your post led me to discover a book review which you might enjoy. It ties “deep structure” concepts of Chomsky, Levi-Strauss, and Jung together in a way I am favorably inclined to consider.(http://egajdbooks.blogspot.com/2012/11/20121124-jung-by-anthony-stevens-read.html)

Perhaps I will share my piece when it is ready . p

Just Think of It

Droplets of inspiration plucked from the firehose

Unsatisfying Encounters with Academics of Renown, Episode 2: Noam Chomsky

Comments

Unsatisfying Encounters with Academics of Renown, Episode 2: Noam Chomsky — 5 Comments