Electronic Aids for Speech Restoration: Reconstructing Speech

Speech is perhaps the most fundamental means by which humans communicate

and the loss of speech can have a severe effect on a person’s quality of life. The

inability to communicate through speech can lead to social isolation and

depression and, indeed, speech is a vital part of our perception of self. Sudden

loss of normal speech can arise from removal of parts of the vocal apparatus, for

instance through a laryngectomy as part of the treatment for laryngeal cancer.

There are a number of aids which attempt to restore speech through essentially

mechanical means, such as diverting airflow or externally vibrating soft tissues,

but none of these is entirely satisfactory.

This paper considers the potential for speech restoration using electronic means.

In particular, it will review the current status of ‘silent speech interfaces’, with a

focus on their applicability in assistive speech technology.

Secondly, we introduce the “Recognition and Reconstruction of Speech following

Laryngectomy” project (REdRESS) that aims at providing an alternative speech

aid for laryngectomised patients. A novel device has been developed that

captures the movements of important speech articulators by measuring the

magnetic field of magnets that are placed at strategic locations on these

articulators, i.e. it is a ‘magnetic field articulograph’. The information obtained

can be used to recreate an acoustic speech signal in two ways.

The first method (“recognition and synthesis”) uses state-of-the-art speech

recognition techniques as they are commonly applied to acoustic speech

signals. These techniques are based on establishing a statistical relation

between speech acoustics and underlying speech content. In REdRESS, the

acoustic signal is replaced by magnetic field data. The output of the recognition

step is a word sequence that can then be passed on to a text-to-speech

synthesiser. The synthesiser could be adapted to ‘speak’ in something

resembling the patients pre-laryngectomy voice using recordings made prior to

the operation.

The second method (“direct synthesis”) consists of the reconstruction of speech

sounds straight from the magnetic field data, without an explicit recognition

stage. Instead, machine learning techniques would be used to learn the mapping

between the magnetic sensor data and the parameters required to control a

synthesiser.

Experiments have demonstrated the potential of both of these approaches. On

the small-vocabulary task of digit recognition, the current system has delivered

speaker-dependent recognition rates of up to 93% on continuous speech, i.e.

connected utterances of multiple digits with no pauses between digits. On an

isolated word recognition experiment with a larger vocabulary of 57 words,

recognition rates of over 98% were achieved.