A discipline that deals generally with the extraction
of information from acoustic signals in the presence of noise and uncertainty.
Acoustic signal processing has expanded from the improvement of music
and speech sounds and a tool to search for oil and submarines to include
medical instrumentation; techniques for efficient transmission, storage,
and presentation of music and speech; and machine speech recognition.
Undersea processing has expanded to studying underwater weather and long-term
global ocean temperature changes, mammal tracking at long ranges, and
monitoring of hot vents. These techniques stem from the rapid advances
in computer science, especially the development of large, inexpensive
memories and ever-increasing processing speeds.
Sound can be said to be low and slow compared to computers. The audio
frequencies are low, spanning roughly the range from 15 to 25,000 hertz,
with seismic frequencies below this range and frequencies relevant to
medical ultrasonics above. All of these frequencies are well within the
limits of analog-to-digital input samplers and digital-to-analog output
converters that form the bridges between continuous-time physical waveforms
and the streams of digits handled by a computer. The speed of sound is
slow, about 335 m/s (1100 ft/s) in air and 1500 m/s (5000 ft/s) in water,
compared to light and radio waves at 3 x 108 m/s (109 ft/s), the upper
limit for electrical signals in computer circuits. A computer can carry
out thousands of elementary computations between input samples.
Signal processors "think" in two domains. One domain is the
pictures of waveforms, the sound pressure as it changes in time. This
time-domain picture is s(t); t is time, and s might be the sound pressure
or a microphone voltage. The other picture is the complex magnitude at
every important frequency during an interval of time. This frequency-domain
or spectrum picture is S(f | Tn); f is the frequency in hertz, and Tn
is the nth time interval. J. Fourier (1768—1830) formalized his series
version of this picture and showed that s(t) could be calculated from
S(f), and vice versa. Mathematicians have developed other versions of
spectral transforms, each with its special area of application.
Each domain picture provides the same information, but sometimes it's easier to think, or compute, using one domain rather than the other.
The computer version used is the DFT (discrete Fourier transform). Transforms
were formerly time-consuming computations; for N points in one domain
the time needed was proportional to N squared. In the 1960s a layered
algorithm was developed to speed up the DFT. The emphasis is on sample
sizes which are integer powers of 2. These fast Fourier transforms have
had a major impact on spectral processing.
One example of acoustic signal processing for biomedical pur poses
is its use in restoring hearing. The cochlea in the inner ear is nature’s
time-domain-to-frequency-domain transformer, exciting a line of nerve
endings in response to something like the short-time spectrum of the
input. For people in whom this natural mechanism is dysfunctional but
the cochlear nerve is intact, a technology is developing based on modest
DFT analysis of sound near an ear, feeding perhaps 32 channels to an
implanted micro-connector in the cochlear nerve bundle.
One of the most difficult signal processing areas is machine recognition
of spoken speech. Vocabulary size and number of speakers are key parameters.
The fundamental problems are:
(1) words are not spoken separately, but in streams of connected sound,
(2) a phrase is seldom said the same way every time, (3) the vocabulary
is enormous, and (4) there is a huge variety of speaker accents and rhythms.
The signal processing must segment an utterance into phonemes, words,
or phrases; pick out key features; or compare the whole segment with
a library for likely matches.
|