Speech is perhaps the most fundamental means by which humans communicate
and the loss of speech can have a severe effect on a person’s quality of life. The
inability to communicate through speech can lead to social isolation and
depression and, indeed, speech is a vital part of our perception of self. Sudden
loss of normal speech can arise from removal of parts of the vocal apparatus, for
instance through a laryngectomy as part of the treatment for laryngeal cancer.
There are a number of aids which attempt to restore speech through essentially
mechanical means, such as diverting airflow or externally vibrating soft tissues,
but none of these is entirely satisfactory.
This paper considers the potential for speech restoration using electronic means.
In particular, it will review the current status of ‘silent speech interfaces’, with a
focus on their applicability in assistive speech technology.
Secondly, we introduce the “Recognition and Reconstruction of Speech following
Laryngectomy” project (REdRESS) that aims at providing an alternative speech
aid for laryngectomised patients. A novel device has been developed that
captures the movements of important speech articulators by measuring the
magnetic field of magnets that are placed at strategic locations on these
articulators, i.e. it is a ‘magnetic field articulograph’. The information obtained
can be used to recreate an acoustic speech signal in two ways.
The first method (“recognition and synthesis”) uses state-of-the-art speech
recognition techniques as they are commonly applied to acoustic speech
signals. These techniques are based on establishing a statistical relation
between speech acoustics and underlying speech content. In REdRESS, the
acoustic signal is replaced by magnetic field data. The output of the recognition
step is a word sequence that can then be passed on to a text-to-speech
synthesiser. The synthesiser could be adapted to ‘speak’ in something
resembling the patients pre-laryngectomy voice using recordings made prior to
the operation.
The second method (“direct synthesis”) consists of the reconstruction of speech
sounds straight from the magnetic field data, without an explicit recognition
stage. Instead, machine learning techniques would be used to learn the mapping
between the magnetic sensor data and the parameters required to control a
synthesiser.
Experiments have demonstrated the potential of both of these approaches. On
the small-vocabulary task of digit recognition, the current system has delivered
speaker-dependent recognition rates of up to 93% on continuous speech, i.e.
connected utterances of multiple digits with no pauses between digits. On an
isolated word recognition experiment with a larger vocabulary of 57 words,
recognition rates of over 98% were achieved.