Image source: Visual China
Titanium Media Note: this article is from WeChat Public Quantum bit (ID:QbitAI) by Xiao Zha Anne Titanium Media authorized to reprint.
Do not move, do not move, how to communicate with people?
It’s a little bit stunned by the eyes, and it’s not very reliable with the sixth sense. Is there a solution to this question?
Yes, with the omnipotent AI. Recently, the Nature submagazine Scientific Reports reported a new technological development: monitor the brain waves and AI can restore the sound you hear.
The study reveals that the average person understands and repeats the accuracy of these AI-reduced sounds by 75%, which is far more effective than previous studies.
The potential for this research has been blown up by directly transforming brain activity into speech.
At present, the system can only restore the simple words of human auditory signals, but the researchers finally want to find the connection between brain waves and speech, so that the people who have lost the ability to speak can speak again.
Imagine that aphasia-free or stroke patients can convert what they think and think directly into textual speech by wearing a brain-computer interface device.
However, all secrets will be exposed, who makes this the most powerful "reading mind" of the contemporary version.
AI "reading the heart"
The study was led by the Nima Mesgarani team of associate professors at Columbia University's School of Electrical Engineering. They completed the study on the basis of a series of experiments.
Before the experiment, the researchers selected five patients who were treated for epilepsy as subjects. The researchers used an implantable electrocortical electrogram (ECoG) method for testing. So the first step in the experiment is to ensure that the electrodes are embedded in everyone's brain.
Subsequently, five subjects turned on the listening test status, just like the college English test, the two female examiners began reading a single number.
This "listening material" is not difficult, the content range is from 0 to 9 the 10 numbers. Female examiner read out at random, read a total of 40 before and after.
The subject only needs to sit there and use the model to reconstruct the language information through the brain-computer interface, and finally read it out by the computer.
So the question is, what is the entire reconstruction process?
During this process, the subject receives the sound, and the acoustic signal is converted into a neuroelectric signal through the cochlea and transmitted to the brain through the vestibular nerve.
The neural network of the auditory cortex of the brain is now active, and the electrical signals are also received by the electrodes.
The researchers collect these changing signals and extract the effective information, that is, high γ envelope (HG) low frequency (LF) signals. Then it is time to reconstruct the sound from these signals.
The researchers used two regression methods and two reconstructed speech representations to find out which reconstruction method worked best in a pair of combinations. As a result, the entire reconstruction process is divided into four different approaches:
(light blue) linear regression + auditory map (Aud Spec), referred to as LAS (purple) linear regression + vocoder, referred to as LV (pink) nonlinear deep neural network (DNN) + (Aud Spec), referred to as DAS (red) Nonlinear deep neural network (DNN) + vocoder, referred to as DV
The DNN architecture consists of two modules: feature extraction network and feature summation network. The former consists of a fully connected neural network (FCN) and a local connection network (LCN), while the latter is a two-layer fully connected neural network (FCN).
DNN architecture diagram
In evaluating the test results, the researchers recruited 11 hearing-impaired volunteers and randomly listened to the audio effects reconstructed using a combination of the four models. Just like college English level 4 listening, each sentence is read only once.
Finally, the volunteers understand and retell the content, the researchers' statistical average results and the average opinion score (MOS).
The results show that the nonlinear deep neural network (DNN) + vocoder combination (red, DV) has the highest degree of reduction, the correct reduction rate of volunteers reached 75%, and the MOS score was the highest, reaching 3.4 points.
In addition, the accurate recognition rate of DV was the best in the combination, and the gender correct recognition rate reached 80%.
If you are interested in the details of the study, you can look at the papers that their team has published:
Towards reconstructing intelligible speech from the human auditory cortex
Difficulties and difficulties
All of the above are not easy things.
"We try to figure out the patterns of neurons turning on and off at different points in time and infer speech." Nima said: "This mapping is not so straightforward."
The pattern in which brain waves are converted into speech varies from person to person, so the model needs to be trained individually for each person. And only the most accurate signal can get the best results. How can I get the most accurate brain waves?
There is only one answer: craniotomy.
However, there are very few opportunities to open the skull for research.
Either during the removal of the brain tumor, the surgeon needs to read the brainwaves to help locate, avoiding the intraoperative injury to the speech and movement areas; or the epilepsy patients a few days before surgery, craniotomy and implant electrodes to determine the site that causes epilepsy.
"We have only 20 to 30 minutes left." Stephanie Martin, from the University of Geneva, Switzerland, says the time available to collect data is very, very limited.
But there are more difficult things than craniotomy.
The current progress is actually just restoring what a person has heard and what do you want to do further? For example, if a aphasic person wants to say something, is this algorithm still useful?
California neurologist Stephanie Riès said that when a person silently "speaks", brain waves must be different from listening. Without the sounds that match the brain's activity, it's hard for the computer to understand the beginning and end of a paragraph inside the brain.
The current level of human technology may not know how to do this.
Christian Herff of the University of Maastricht in the Netherlands offers an idea:
When you hear a voice, you can quickly meditate in the brain. As long as you fully train humans and neural networks, perhaps AI will eventually have a complete "reading mind".
Typing from the brain to the brain
Since the invention of the computer, humans have been trying to achieve brain-computer interaction, known as "retrocerebral intubation."
At the F8 Developers Conference two years ago, Facebook showed how to make a person with a gradual cold syndrome type with the brain at a speed of 8 words per minute. Although the speed is not as good as hand, it is a great gospel for people with disabilities. Facebook's future goal is to achieve 100 words per minute.
A team of neuroscientists in China are also working on the study. Last year, qubits experienced "mind typing" in Tsinghua University's laboratory, controlling 26 letters on a soft keyboard to type out any sentence.
Last year, scientists at Kyoto University continued to control the keyboard and regained the image of the human brain. It's not just simple symbols, but photos with multiple colors and structures.
With this technology, it is easy to know what has happened, where one has been, and even daydreaming scenes can be read out.
But speech is the most important way for human beings to communicate with the outside world. The future of Columbia's research, if it is really practical, is limitless.
There are benefits outside the study~
Along with the release of the study, the researchers also opened a neuroacoustical processing library, Nap Lib, which can be used to represent the properties of linguistic neural networks.
Nap Lib is suitable for both implantable and non-implantable devices. It is a general tool for the study of EEG (EEG), cortical electrogram (ECoG) and magnetoencephalogram (MEG).
Introduction to Nap Lib: http://naplab.ee.columbia.edu/naplib.html
GitHub address: https://github.com/Naplib/Naplib
More exciting content, focus on Titanium Media WeChat (ID: taimeiti), or download Titanium Media App