CAE Header gif
Foresic Audio and Auditory Illusions
MAIN MENU
Forensic Audio
Pro Audio
Audio Examples
The Studio
Contact
Links
Site Map
Home
Most of us are familiar with the optical phenomenon known as a mirage. A similar event occurs with our auditory perception, for which I use the phrase, "auditory illusion". Such illusions are triggered by certain combinations of the auditory stimuli of frequencies and rhythmic patterns, with perhaps varying pitch. Auditory illusions are not uncommon. There is a branch of psychology known as Psychoacoustics which studies auditory perception, including auditory illusions.

Human speech is a combination of frequencies and rhythms. A speakers attempt at verbal communication occurs when these elements are controlled by articulation cues, such as the vocal production of the sounds represented by the letters /t/, /k/, /f/, /sh/, /p/, /s/, which are actually clicks, pops, starts, stops, swishes and swooshes, hisses, squeaks and whistles. All of these sonic events can, and do, exist in forms outside of human speech. It is not unlikely that each of us at some time in our life will mistake a sound such as the creak of a door, or the scrape of a chair across a floor, for a human utterance. This is an auditory illusion.

It is of course possible to misinterpret auditory stimuli in ways unrelated to speech through precisely the same mechanisms. The examples involving speech are just a special case. For example, consider the combat veteran who dives for cover when a book drops, or the mistaking of a backfire for a gun shot. However, the most frequent case of auditory illusion encountered in the forensic audio lab is that involving speech, or more appropriately "alleged speech".

To understand auditory illusions and how they relate to the field of forensic audio it is important to consider some elements of electronic audio, recording environments and human auditory perception:

    1. Every electronic device has system noise, whether it is a million dollar device or a ten dollar device. All recording devices introduce system noise into the audio being recorded. System noise can be a combination of clicks, pops, squeaks, creaks, crackle, sizzle, hum and white noise, each of which has varying rhythms and frequencies. The amplitude of system noise is represented by signal-to-noise ratio figures (S/N). A device with a S/N of -60 dB has more perceived system noise mixed with the desired signal than a device with a S/N of -90 dB. A common field recording device used by law enforcement and others is the microcassette recorder, which has an unfortunate typical S/N of only -35 dB (depending on manufacturer and model). To hear the system noise of such a device one has only to turn it on (playback mode) without a tape inserted and turn up the volume to hear the extreme hiss, or turn the volume completely off to hear the hum and mechanical noise.

    2. All recording tape has inherent noise in the form of hiss, and perhaps clicks and pops. How extreme this is will depend on the quality of the tape.

    3. All playback devices introduce extraneous noise into the sound produced during playback. This noise is in the form of electronic system noise and mechanical noise and is in addition to the system noise introduced into the audio during recording.

    4. Recording environments are never silent, even when no one is present. There is always some sort of environmental sound, whether a refrigerator, wind, wall creaks, air conditioner or heating unit.

    5. Temporal masking by environmental sounds may also cause auditory illusions. Temporal masking occurs when one sound comes so close to another that the two seem as one, or the first masks the beginning articulation cue of the second making it sound like something other than what it is.

    6. Human auditory perception is binaural (two ears, stereo, omni-directional) for those with normal hearing ability. Monaural recordings present an additional exacerbating factor for hearing and correct identification of an ambiguous audio signal. Mono recordings create a small focal point, which is contrary to our normal manner of hearing. Normally, we obtain auditory clues from all directions. In a natural environment sounds have spatial placement. An environmental click or thump can be identified by its location to our left, or right, or behind us, or above us, and not perceived as originating from a human as being part of their speech. A mono recording groups all sounds together and gives the impression of originating from one location. In the forensic audio lab environment it is sometime helpful to use stereo emulation techniques (sometimes referred to as mono-to-stereo remastering or binaural simulation).

    7. Psychoacoustic research has told us that our auditory perceptual coding is very good at extrapolating (filling in what isn't there) and hearing what it wants or expects to hear as the brain imposes an order on the sonic events. Such ordering for perception is biased by previous experience of the listener, such as having expectations or unconsciously fitting a sonic event into a previous sonic experience. An excellent example of this is the telephone: Human speech has a frequency range from approximately 100 Hz to 8,000 Hz, with the fundamental frequency being anywhere between approximately 80 Hz to 200 Hz, depending on sex and age of the speaker. Fricatives and sibilance's can be as high as 4,500 Hz to 8,000 Hz, again depending on sex and age of the speaker. Normal telephone line transmission conveys frequencies between 250 Hz and 3,500 Hz. What we hear when speaking to another person are the overtone frequencies of their voice in the range of 250-3,500 Hz, yet we still can identify the familiar voice of our long time friend because our brain has extrapolated the missing data. (Note that most microcassette recorders have a frequency range from 300 Hz to 4,000 Hz when used on high speed recording, 300 Hz to 3,000 Hz when recording on slow speed.)


As with the mirage, which is an optical distortion created by combining alternating layers of hot and cool air and is an optical system misinterpretation of data, auditory illusions are auditory system misinterpretations of data. Auditory illusions are created by a combining of frequencies and rhythms from device system noise, environmental sonic events, device playback noise, recording tape anomalies and our brain imposing an order on the sonic events presented for interpretation. When listening to difficult to hear audio the brain will eventually impose an order to the frequencies and rhythm patterns, and may decide that something specific was said, or may assign a particular subjective interpretation.

A recording with a very low amplitude of the desired signal (such as speech) is particularly susceptible to auditory misinterpretation because such a low amplitude signal may easily combine with other sonic elements and the brain may extrapolate and create false interpretations. As an example, a microcassette recording done in an environment in which an air conditioning unit is in operation, but which has no human speech, may combine the frequencies and rhythms of the air conditioner mechanism and fan hiss with those of device system noise and other environmental clicks and squeaks to produce what is thought to be a distant low amplitude human utterance. For example, the human produced sound of /sh/ (as in /show/) is closely related in sonic characteristics to the shhhh hissing sound of a microcassette recording device system hiss and recording tape hiss. This combined with a barely audible environmental squeak similar to the sound of the vowel /e/ , such as the movement of an old rusty gate hinge, may be falsely interpreted as the human utterance /she/ if all elements combine in what the brain perceives as being appropriate sequence, pitch, rhythm and duration.

False interpretations often result from listening to a recording with a poor quality playback device such as the sonically challenged microcassette recorder, most of which have a small one inch speaker, have high intrusive system noise and intrusive playback mechanical noise. When holding such a device close to the ear and lowering the volume the rhythmic mechanical noise can become even more intrusive. Also, auditory illusions may present themselves on very fine audio systems when intrusive noise (system or environmental) is dominant. Another factor which may create an auditory illusion is environmental resonance within the playback environment, including a resonating plastic case of a playback device such as a small standard cassette recorder, speaker enclosure resonance and wall reflections.

Interpretive conclusions made from listening to difficult to hear audio, especially those with low amplitude sonic events, may be false and be nothing more than auditory illusions. To avoid false interpretations, recordings of criminal, civil, or personal importance should be submitted for examination by a qualified forensic audio engineer so that intrusive system noise and environmental noises can be reduced or eliminated and perceived sonic events verified as to their origin and content. In some extreme instances sonic events may not be conclusively identified, but frequently a reasonable conclusion can be obtained which may negate previous erroneous interpretations or confirm previous correct interpretations.

John J. Mitchell
owner/engineer
Computer Audio Engineering
September 8, 1999

Thanks to William S. Hays for his valuable suggetions and editorial expertise.

Forensic Menu
Forensic Home
Introduction
Methods/Services
FAQ
Chain of Custody
Auditory Illusions
Phone Recording
Client List
Top
Top
Copyright © 1999-2003 by CAE
All Rights Reserved