Posts Tagged ‘speech processing’

Things Speaking in Tongues

January 25, 2017


Speaking in tongues (aka Glossolalia) is the fluid vocalizing of speech-like syllables without any recognizable association with a known language. Such experience is best (not ?) understood as the actual speaking of a gutted language with grammatical ghosts inhabited by meaningless signals.

The man behind the tongue (Herbert List)

Do You Hear What I Say ? (Herbert List)

Usually set in religious context or circumstances, speaking in tongue looks like souls having their own private conversations. Yet, contrary to extraterrestrial languages, the phenomenon is not fictional and could therefore point to offbeat clues for natural language technology.

Computers & Language Technology

From its inception computers technology has been a matter of language, from machine code to domain specific. As a corollary, the need to be in speaking terms with machines (dumb or smart) has put a new light on interpreters (parsers in computer parlance) and open new perspectives for linguistic studies. In due return, computers have greatly improve the means to experiment and implement new approaches.

During the recent years advances in artificial intelligence (AI) have brought language technologies to a critical juncture between speech recognition and meaningful conversation, the former leaping ahead with deep learning and signal processing, the latter limping along with the semantics of domain specific languages.

Interestingly, that juncture neatly coincides with the one between the two intrinsic functions of natural languages: communication and representation.

Rules Engines & Neural Network

As exemplified by language technologies, one of the main development of deep learning has been to bring rules engines and neural networks under a common functional roof, turning the former unfathomable schemes into smart conceptual tutors for the latter.

In contrast to their long and successful track record in computer languages, rule-based approaches have fallen short in human conversations. And while these failings have hindered progress in the semantic dimension of natural language technologies, speech recognition have strode ahead on the back of neural networks fueled by increasing computing power. But the rift between processing and understanding natural languages is now being fastened through deep learning technologies. And with the leverage of rule engines harnessing neural networks, processing and understanding can be carried out within a single feedback loop.

From Communication to Cognition

From a functional point of view, natural languages can be likened to money, first as medium of exchange, then as unit of account, finally as store of value. Along that understanding natural languages would be used respectively for communication, information processing, and knowledge representation. And like the economics of money, these capabilities are to be associated to phased cognitive developments:

  • Communication: languages are used to trade transient signals; their processing depends on the temporal persistence of the perceived context and phenomena; associated behaviors are immediate (here-and-now).
  • Information: languages are also used to map context and phenomena to some mental representations; they can therefore be applied to scripted behaviors and even policies.
  • Knowledge: languages are used to map contexts, phenomena, and policies to categories and concepts to be stored as symbolic representations fully detached of original circumstances; these surrogates can the be used, assessed, and improved on their own.

As it happens, advances in technologies seem to follow these cognitive distinctions, with the internet of things (IoT) for data communications, neural networks for data mining and information processing, and the addition of rules engines for knowledge representation. Yet paces differ significantly: with regard to language processing (communication and information), deep learning is bringing the achievements of natural language technologies beyond 90% accuracy; but when language understanding has to take knowledge into account, performances still lag a third below: for computers knowledge to be properly scaled, it has to be confined within the semantics of specific domains.

Sound vs Speech

Humans listening to the Universe are confronted to a question that can be unfolded in two ways:

  • Is there someone speaking, and if it’s the case, what’s the language ?.
  • Is that a speech, and if it’s the case, who’s speaking ?.

In both case intentionality is at the nexus, but whereas the first approach has to tackle some existential questioning upfront, the second can put philosophy on the back-burner and focus on technological issues. Nonetheless, even the language first approach has been challenging, as illustrated by the difference in achievements between processing and understanding language technologies.

Recognizing a language has long been the job of parsers looking for the corresponding syntax structures, the hitch being that a parser has to know beforehand what it’s looking for. Parser’s parsers using meta-languages have been effective with programming languages but are quite useless with natural ones without some universal grammar rules to sort out babel’s conversations. But the “burden of proof” can now be reversed: compared to rules engines, neural networks with deep learning capabilities don’t have to start with any knowledge. As illustrated by Google’s Multilingual Neural Machine Translation System, such systems can now build multilingual proficiency from sufficiently large samples of conversations without prior specific grammatical knowledge.

To conclude, “Translation System” may even be self-effacing as it implies language-to-language mappings when in principle such systems can be fed with raw sounds and be able to parse the wheat of meanings from the chaff of noise. And, who knows, eventually be able to decrypt languages of tongues.

Further Reading

External Links