Overview¶
When an event occurs in the world around us, say a vocalist sings the opening bars of Amazing Grace, information about the event is transmitted to us by light waves and sound waves emanating from the vocalist. Our eyes create a sequence of visual images of the event, our ears form a sequence of auditory images of the event, and the brain combines these image sequences, with any other sensory input, to produce our initial experience of the event. In parallel, the brain interprets the event with respect to the scene in which it occurs, and in terms of stored knowledge concerning related events in similar scenes. Conceptually, the Auditory Image Model (AIM) describes how the auditory system might construct the initial representation of an auditory event from the sound waves entering the ear from the environment. This project provides two applications that convert waves into auditory images: AIM-MAT and AIM-C.
AIM-MAT is a version of the auditory image model written in MATLAB with a GUI that allows you to investigate auditory processing stage by stage. There is a tutorial that explains the details of auditory processing with figures illustrating the internal representation of sound at successive stages in the auditory pathway.
AIM-C is a real time version of the auditory image model written in C that is suitable for batch processing of sound databases. The AIM-MAT tutorial provides a reasonable introduction to the processing provided by AIM-C.
The focus of AIM is communication sounds like those that occur when a performer sings a melody or recites a line of poetry. These sounds are similar in form to the territorial calls of animals. Communication sounds are dominated by complex tones about a quarter of a second in duration. These communication tones are heard as the vowels of speech and the notes of melodies, as well as the hoots and coos in animal calls. It is argued that these complex tones are the basic building blocks of auditory perception. They produce stable structures with distinctive shapes in the auditory image, and these auditory figures are often sufficient to identify the source of the sound. Sequences of these auditory figures give auditory events meaning and segregate the auditory scene into foreground and background events.
There is a wiki that describes the production of communication sounds, the form of the information in communication sounds, and how the auditory system converts communication sounds into the auditory images we hear when these sounds occur in everyday life. It explains what is meant by the terms auditory image, auditory figure, and auditory event, as well as the auditory scene and auditory objects. It also explains how the auditory system might create the internal, neural representations of sound that support auditory perception and bio-acoustic communication. The focus is on signal processing transforms that can be applied to all sounds to produce an internal space of auditory events where the important features of communication sounds are automatically segregated in preparation for subsequent perceptual processing.