================================================================================ Article: 827 of alt.binaries.sounds.misc Newsgroups: alt.binaries.sounds.misc From: thomas@obelix.serum.kodak.com (bulli@kodak.com) Subject: Re: Sound formats. Organization: Clinical Products Division, Eastman Kodak Company References: <1992Mar19.234132.6185@comp.vuw.ac.nz> Distribution: alt Date: Fri, 20 Mar 1992 12:17:45 GMT Lines: 964 Archive-name: audio-fmts/part1 Submitted-by: guido@cwi.nl Version: 1.4 Last-modified: 5-Feb-1992 The audio formats guide (version 1.4) ===================================== Introduction ------------ In November 1991 I posted a first version of this guide, with many missing details and questions. It prompted a deluge of e-mail giving me the missing information in nauseating detail, and in a few weeks I published a completely rewritten, vastly updated version. I also added several appendices that treat certain file formats in more detail. Currently it hasn't completely stabilized yet, but is obviously reaching maturity, and I am still receiving valuable fan-mail. I plan to post this guide about once a month, either unchanged (just to inform new readers), or updated (if I learn more or when new hardware or software becomes popular). I post to alt.binaries.sounds.d and to comp.dsp, for maximal coverage of people interested in audio, and to news.answers, for easy reference. (Actually, there are so many questions in alt.binaries.sounds.d are answered here that I post a little more often.) Send updates, comments and questions to ; flames to /dev/null. NOTE TO SENDERS OF UPDATES TO VERSION 1.3: I have received some voluminous material (especially on the Atari and Amiga) that I want to incorporate somehow, hower I currently don't have the time for a major edit, nor does it seem fit for an appendix. Next month looks more promising time-wise. I haven't forgotten you! I'd like to thank everyone who sent me mail with updates for the previous versions. The list of names is really too long to list you all... --Guido van Rossum, CWI, Amsterdam Device characteristics ---------------------- In this text, I will only use the term "sample" to refer to a single output value from an A/D converter, i.e., a small integer number (usually 8 or 16 bits). Audio data is characterized by the following parameters, which correspond to settings of the A/D converter when the data was recorded. Naturally, the same settings must be used to play the data. - sampling rate (in Hz, or samples per second), e.g., 8000 or 44100 - number of bits per sample, e.g., 8 or 16 - number of channels (1 for mono, 2 for stereo, etc.) Approximate sampling rates are often quoted in kHz (kilo-Hertz), e.g., 8 kHz, 44.1 kHz. Sampling rates are always measured per channel, so for stereo data recorded at 8 kHz, there are actually 16000 samples in a second. Multi-channel samples are generally interleaved on a frame-by-frame basis: if there are N channels, the data is a sequence of frames, where each frame contains N samples, one from each channel. (Thus, the sampling rate is really the number of *frames* per second.) For stereo, the left channel usually comes first. The specification of the number of bits for U-LAW (pronounced mu-law -- the u really stands for the Greek letter mu) samples is somewhat problematic. These samples are logarithmically encoded in 8 bits, like a tiny floating point number; however, their dynamic range is that of 13 or 14 bit linear data. Source for converting to/from U-LAW (written by Jef Poskanzer) is distributed as part of the SoundKit package mentioned below; it can easily be ripped apart to serve in other applications. (There exists another encoding similar to U-LAW, called A-LAW, which is used as a European telephony standard. I don't know how it differs from U-LAW. There is less support for it in UNIX workstations.) Popular sampling rates ---------------------- Some sampling rates are more popular than others, for various reasons. Some recording hardware is restricted to (approximations of) some of these rates, some playback hardware has direct support for some. The popularity of divisors of common rates can be explained by the simplicity of clock frequency dividing circuits :-) 5.5 kHz One fourth of the Mac sampling rate. 7.333 kHz One third of the Mac sampling rate. 8 kHz Exactly 8000 Hz is a telephony standard that goes together with U-LAW (and also A-LAW) encoding. Some systems use an approximation of 8 kHz; in particular, the NeXT workstation uses 8012.8210513 Hz, or so the documentation claims. (Can anyone at NeXT explain why?) 11 kHz Either 11025 Hz, a quarter of the CD sampling rate, or half the Mac sampling rate (perhaps the most popular rate on the Mac). 22 kHz Either 22050 Hz, half the CD sampling rate, or the Mac rate of precisely 22254.545454545454.) 32 kHz Used in digital radio, NICAM (what's that?) and other TV work; also some DAT machines can do it. 44.1 kHz The CD sampling rate, precisely 44100 Hz. 48 kHz The DAT (Digital Audio Tape) sampling rate, precisely 48000 Hz. While professinal musicians disagree, most people don't have a problem if recorded sound is played at a slightly different rate, say, 1-2%. On the other hand, if recorded data is being fed into a playback device in real time (say, over a network), even the smallest difference in sampling rate can frustrate the buffering scheme used... Current hardware ---------------- I am aware of the following computer systems that can record and play back audio data, with their characteristics. (Note that for most systems you can also buy "professional" sampling hardware, which supports much better quality.) machine bits sampling rate #chans extension/format Mac 8 up to 22 kHz 1 .snd Apple IIgs ? ? 15 IFF/AIFF PC/Soundblaster 8 up to 13/22 kHz 1 .voc Atari ST 8 up to 22 kHz 1 .snd Atari STe,TT 8 6-50 KHz 2 .snd Amiga 8 up to ~29 kHz 4(st) IFF/8SVX Sun Sparc U-LAW 8 kHz 1 .au NeXT U-LAW,8,16 8,22,44 kHz 1(st) .snd, .vox SGI Indigo 8,16 8-48 kHz 4(st) IFF/AIFF Acorn Archimedes ~U-LAW any <= 8 various Sony RISC-NEWS 8, 16 8-37.8 kHz ?(st) .au, others? Where 4(st) means "four voices, stereo". All these machines can play back sound without additional hardware, although the needed software is not always standard; only the Sun, NeXT and SGI come with standard sampling hardware (the NeXT only samples U-LAW at 8 kHz from the built-in microphone port; you need a separate board for other rates). Software exists for the PC that can play sound on its 1-bit speaker using pulse width modulation (see appendix); the Soundblaster board records at rates up to 13 kHz and plays back up to 22 kHz (weird combination, but that's the way it is). For most machines third party hardware exists that does much more than this; I'd rather not go into this because I would be comparing products that I've never seen (or heard!). On the NeXT, the Motorola 56001 DSP chip is programmable and you can (in principle) do what you want. The SGI uses the same DSP chip but it can't be programmed by users -- SGI claims you can crash the system that way. The Amiga also has a 6-bit volume, which can be used to produce something like a 14-bit output for each voice. The hardware can also use one of each voice-pair to modulate the other in FM (period) or AM (volume, 6-bits). The Acorn Archimedes uses a variation on U-LAW with the sign bit in bit 0. Being a 'minority' architecture, Arc owners are quite adept at converting sound/image formats from other machines. File formats ------------ Historically, almost every type of machine used its own file format for audio data, but some file formats are more generally applicable, and in general it is possible to define conversions between almost any pair of file formats -- sometimes losing information, however. File formats are a separate issue from device characteristics. There are two types of file formats: self-describing formats, where the device parameters and encoding are made explicit in some form of header, and "raw" formats, where the device parameters and encoding are fixed. Self-describing file formats generally define a family of data encodings, where a header fields indicates the particular encoding variant used. Headerless formats define a single encoding and usually allows no variation in device parameters (except sometimes sampling rate, which can be a pain to figure out other than by listening to the sample). The header of self-describing formats contains the parameters of the sampling device and sometimes other information (e.g., a human-readable description of the sound, or a copyright notice). Most headers begin with a simple "magic word". (Some formats do not simply define a header format, but may contain chunks of data intermingled with chunks of encoding info.) The data encoding defines how the actual samples are stored in the file, e.g., signed or unsigned, as bytes or short integers, in little-endian or big-endian byte order, etc. Strictly spoken, channel interleaving is also part of the encoding, although so far I have seen little variation in this area. Some file formats apply some kind of compression to the data, e.g., Huffman encoding, or simple silence deletion. Here's an overview of popular file formats. Self-describing file formats ---------------------------- extension origin variable parameters (fixed; comments) or name .au, .snd, .vox NeXT, Sun rate, #channels, encoding, info string .aif(f), AIFF Mac rate, #channels, sample width, lots of info .iff, IFF/8SVX Amiga rate, #channels, instrument info (8 bits) .voc Soundblaster rate (8 bits/1 ch; uses silence deletion) .sf IRCAM rate, #channels, encoding, info HCOM Mac rate (8 bits/1 ch; uses Huffman compression) Note that the filename extension ".snd" is ambiguous: it can be either the self-describing NeXT format or the headerless Mac/PC format. I know nothing for sure about the origin of HCOM files, only that there are a lot of them floating around on our system and probably at FTP sites over the world. The filenames usually don't have a ".hcom" extension, but this is what SoundKit (see below) uses. The file format recognized by SoundKit includes a MacBinary header, where the file type field is "FSSD". The data fork begins with the magic word "HCOM" and contains Huffman compressed data; after decompression it corresponds to the headerless ".snd" format below (8 bits unsigned). IFF/8SVX allows for amplitude contours for sounds (attack/decay/etc). Compression is optional (and extensible); volume is variable; author, notes and copyright properties; etc. See an appendix for a description. There are other sound formats in use on Amiga by digitizers and music programs, such as IFF/SMUS. Pointers to more info about AIFF are in an appendix; another appendix describes the NeXT format. DEC systems (e.g., DECstation 5000) use a variant of the NeXT format that uses little-endian encoding and has a different magic number (0x0064732E in little-endian encoding). Headerless file formats ----------------------- extension origin parameters or name .snd Mac, PC variable rate, 1 channel, 8 bits unsigned .ulaw US telephony 8 kHz, 1 channel, 8 bit "U-LAW" encoding ? Amiga variable rate, 1 channel, 8 bits signed It is usually easy to distinguish 8-bit signed formats from unsigned by looking at the beginning of the data with 'od -b /dev/audio". A whole package for dealing with ".au" files is provided by Sun on an experimental basis, in /usr/demo/SOUND. You may have to compile the programs first. (If you can't find this directory, either you are not running SunOS 4.1 yet, or your system administrator hasn't installed it -- go ask him for it, not me!) The program "play" in this directory recognizes all files in Sun/NeXT format, but can play only those using U-LAW encoding at 8 kHz. You can also cat a ".au" file to /dev/audio, if it uses U-LAW; the header will sound like a short burst of noise but the rest of the data will sound OK (really, the only difference in this case between raw U-LAW and ".au" files is the header; the U-LAW data is exactly the same). Finally, OpenWindows 3.0 has a full-fledged audio tool. You can drop audio file icons into it, edit them, etc. NeXT ---- On NeXT machines, the standard "sndplay" program can play all NeXT format files (this include Sun ".au" files). It supports at least U-LAW at 8 kHz and 16 bits samples at 22 or 44.1 kHz. It attempts on-the-fly conversions for other formats. Sound files are also played if you double-click on them in the file browser. SGI Indigo and 4D/35 -------------------- On SGI Indigo and 4D/35 workstations, the program "playaiff" (in /usr/sbin) plays AIFF files, if the sampling rate is one of 8000, 11025, 16000, 22050, 32000, 44100, or 48000 (the library interface to the hardware doesn't support other rates -- I don't know what the hardward is actually capable of). On the 4D/35, you need to have the audio board installed (check the output from hinv) and you must run IRIX 3.3.2 or 4.0. There is no simple /dev/audio interface on these SGI machines. (There was one on 4D/25 machines, reading and writing signed linear 8-bit samples at rates of 8, 16 and 32 kHz; unfortunately the board design caused a lot of noise from the CPU board to clutter the audio signals.) A program "playulaw" was posted as part of the "radio 1.0" release that I posted to alt.sources recently; it plays raw U-LAW files on the Indigo or 4D/35 audio hardware. Sony NEWS --------- The Sony RISC-NEWS line (NWS-3250 laptop, NWS-37xx desktop, NWS-38xx desktop w/ IOP) also has builtin sound capabilities. You can also buy external boards for the older NEWS machines or to add extra channels to the new machines. In the default mode (8kHz/8-bit), Sun .au files are directly supported (you can 'cat' .au files to /dev/sb and have them play). Others ------ Most other UNIX boxes don't have audio hardware and thus can't play audio data. Playing audio files on micros ----------------------------- Most micros have at least a speaker built in, so theoretically all you need is the right software. Unfortunately most systems don't come bundled with sound-playing software, so there are many public domain or shareware software packages, each with their own bugs and features. Most separate sound recording hardware also comes with playing software, most of which can play sound (in the file format used by that hardware) even on machines that don't have that hardware installed. One of the appendices below contains a list of programs to play sound on the PC. For sounds on Atari STs - programs are in the atari/sound/players directory on atari.archive.umich.edu (141.211.164.8). Malcolm Slaney from Apple writes: "We do have tools to play sound back on most of our Unix hosts. We wrote a program called TcpPlay that lets us read a sound file on a Unix host, open a TCP/IP connection to the Mac on my desk, and plays the file. We think of it as X windows for sound (at least a step in that direction.) This software is available for anonymous FTP from apple.com. Look for ~ftp/pub/TcpPlay/TcpPlay.sit.hqx." The Sound Site Newsletter ------------------------- An electronic publication with lots of info about digitised sound and sound formats, albeit mostly on micros, is "The Sound Site Newsletter". So far, 7 issues have appeared, the last in October 1991. Issues can be ftp'ed from saffron.inset.com, directory directory pub/rogue/newsletters, or from ccb.ucsf.edu, Pub/Sound_list/Sound.Newsletters. Posting Sounds -------------- The newsgroup alt.binaries.sounds.misc is dedicated to postings containing sound. (Discussions related to such postings belong in alt.binaries.sounds.d.) There is no set standard for posting sounds; uuencoded files in most popular formats are welcome, if split in parts under 50 kBytes. There is a proposed standard for posing uuencoded files in a way that makes it easier to automatically combine the parts and uudecode the file, but I don't know anything about this. It is recommended to post sounds in the format that was used for the original recording; conversions to other formats often lose information and would do people with identical hardware as the poster no favor. For instance, convering 8-bit linear sound to U-LAW loses the lower few bits of the data, and rate changing conversions almost always add noise. Converting from U-LAW to linear requires expansion to 16 bit samples if no information loss is allowed! U-LAW data is best posted with a NeXT/Sun header. If you have to post a file in a headerless format (usually 8-bit linear, like ".snd"), please add a description giving at least the sampling rate and whether the bytes are signed (zero at 0) or unsigned (zero at 0200). Compression of sound files usually isn't worth it; the standard "compress" algorithm doesn't save much when applied to sound data (typically at most 10-20 percent), and compression algorithms specifically designed for sound (e.g., NeXT's) are usually proprietary. Appendices ========== Here are some more detailed pieces of info that I received by e-mail. They are reproduced here without editing. ------------------------------------------------------------------------ FTP access for non-internet sites --------------------------------- >From the sci.space FAQ: Sites not connected to the Internet cannot use FTP directly, but there are a few automated FTP servers which operate via email. Send mail containing only the word HELP to ftpmail@decwrl.dec.com or bitftp@pucc.princeton.edu, and the servers will send you instructions on how to make requests ------------------------------------------------------------------------ AIFF Format (Audio IFF) ----------------------- This format was developed by Apple for storing high-quality sampled sound and musical instrument info; it is also used by SGI and several professional audio packages (sorry, I know no names). An extension, called AIFF-C, supports compression (see the last item below). I've made a BinHex'ed MacWrite version of the AIFF spec (no idea if it's the same text as mentioned below) available by anonymous ftp from ftp.cwi.nl [192.16.184.180]; the file is /pub/AudioIFF1.2.hqx. But you may be better off with the AIFF-C specs, see below. Mike Brindley (brindley@ece.orst.edu) writes: "The complete AIFF spec by Steve Milne, Matt Deatherage (Apple) is available in 'AMIGA ROM Kernal Reference Manual: Devices (3rd Edition)' 1991 by Commodore-Amiga, Inc.; Addison-Wesley Publishing Co.; ISBN 0-201-56775-X, starting on page 435 (this edition has a charcoal grey cover). It is available in most bookstores, and soon in many good librairies." Finally, Mark Callow writes (in comp.sys.sgi): "I have placed a PostScript version of the AIFF-C specification on sgi.sgi.com for public ftp. It is in the file sgi/aiff-c.9.26.91.ps. sgi.sgi.com's internet host number is (I think) 192.48.153.1." ------------------------------------------------------------------------ The NeXT/Sun audio file format ------------------------------ [I received this from Doug Keislar, NeXT Computer. This is also the Sun format, except that Sun doesn't recognize as many format codes. I added the numeric codes to the table of formats and sorted it. --Guido] Here's the complete story on the file format, from the NeXT documentation. (Note that "Sound Kit" refers to NeXT's product, not the public domain SoundKit utility. Also note that the "magic" number is ((int)0x2e736e64), which equals ".snd".) Also, at the end, I've added a litte document that someone posted to the net a couple of years ago, that describes the format in a bit-by-bit fashion rather than from C. SNDSoundStruct: How a NeXT Computer Represents Sound The NeXT sound software defines the SNDSoundStruct structure to represent sound. This structure defines the soundfile and Mach-O sound segment formats and the sound pasteboard type. It's also used to describe sounds in Interface Builder. In addition, each instance of the Sound Kit's Sound class encapsulates a SNDSoundStruct and provides methods to access and modify its attributes. Basic sound operations, such as playing, recording, and cut-and-paste editing, are most easily performed by a Sound object. In many cases, the Sound Kit obviates the need for in-depth understanding of the SNDSoundStruct architecture. For example, if you simply want to incorporate sound effects into an application, or to provide a simple graphic sound editor (such as the one in the Mail application), you needn't be aware of the details of the SNDSoundStruct. However, if you want to closely examine or manipulate sound data you should be familiar with this structure. The SNDSoundStruct contains a header, information that describes the attributes of a sound, followed by the data (usually samples) that represents the sound. The structure is defined (in sound/soundstruct.h) as: typedef struct { int magic; /* magic number SND_MAGIC */ int dataLocation; /* offset or pointer to the data */ int dataSize; /* number of bytes of data */ int dataFormat; /* the data format code */ int samplingRate; /* the sampling rate */ int channelCount; /* the number of channels */ char info[4]; /* optional text information */ } SNDSoundStruct; SNDSoundStruct Fields magic magic is a magic number that's used to identify the structure as a SNDSoundStruct. Keep in mind that the structure also defines the soundfile and Mach-O sound segment formats, so the magic number is also used to identify these entities as containing a sound. dataLocation It was mentioned above that the SNDSoundStruct contains a header followed by sound data. In reality, the structure only contains the header; the data itself is external to, although usually contiguous with, the structure. (Nonetheless, it's often useful to speak of the SNDSoundStruct as the header and the data.) dataLocation is used to point to the data. Usually, this value is an offset (in bytes) from the beginning of the SNDSoundStruct to the first byte of sound data. The data, in this case, immediately follows the structure, so dataLocation can also be thought of as the size of the structure's header. The other use of dataLocation, as an address that locates data that isn't contiguous with the structure, is described in "Format Codes," below. dataSize, dataFormat, samplingRate, and channelCount These fields describe the sound data. dataSize is its size in bytes (not including the size of the SNDSoundStruct). dataFormat is a code that identifies the type of sound. For sampled sounds, this is the quantization format. However, the data can also be instructions for synthesizing a sound on the DSP. The codes are listed and explained in "Format Codes," below. samplingRate is the sampling rate (if the data is samples). Three sampling rates, represented as integer constants, are supported by the hardware: Constant Sampling Rate (Hz) SND_RATE_CODEC 8012.821 (CODEC input) SND_RATE_LOW 22050.0 (low sampling rate output) SND_RATE_HIGH 44100.0 (high sampling rate output) channelCount is the number of channels of sampled sound. info info is a NULL-terminated string that you can supply to provide a textual description of the sound. The size of the info field is set when the structure is created and thereafter can't be enlarged. It's at least four bytes long (even if it's unused). Format Codes A sound's format is represented as a positive 32-bit integer. NeXT reserves the integers 0 through 255; you can define your own format and represent it with an integer greater than 255. Most of the formats defined by NeXT describe the amplitude quantization of sampled sound data: Value Code Format 0 SND_FORMAT_UNSPECIFIED unspecified format 1 SND_FORMAT_MULAW_8 8-bit mu-law samples 2 SND_FORMAT_LINEAR_8 8-bit linear samples 3 SND_FORMAT_LINEAR_16 16-bit linear samples 4 SND_FORMAT_LINEAR_24 24-bit linear samples 5 SND_FORMAT_LINEAR_32 32-bit linear samples 6 SND_FORMAT_FLOAT floating-point samples 7 SND_FORMAT_DOUBLE double-precision float samples 8 SND_FORMAT_INDIRECT fragmented sampled data 9 SND_FORMAT_NESTED ? 10 SND_FORMAT_DSP_CORE DSP program 11 SND_FORMAT_DSP_DATA_8 8-bit fixed-point samples 12 SND_FORMAT_DSP_DATA_16 16-bit fixed-point samples 13 SND_FORMAT_DSP_DATA_24 24-bit fixed-point samples 14 SND_FORMAT_DSP_DATA_32 32-bit fixed-point samples 15 ? 16 SND_FORMAT_DISPLAY non-audio display data 17 SND_FORMAT_MULAW_SQUELCH ? 18 SND_FORMAT_EMPHASIZED 16-bit linear with emphasis 19 SND_FORMAT_COMPRESSED 16-bit linear with compression 20 SND_FORMAT_COMPRESSED_EMPHASIZED A combination of the two above 21 SND_FORMAT_DSP_COMMANDS Music Kit DSP commands 22 SND_FORMAT_DSP_COMMANDS_SAMPLES ? Most formats identify different sizes and types of sampled data. Some deserve special note: -- SND_FORMAT_DSP_CORE format contains data that represents a loadable DSP core program. Sounds in this format are required by the SNDBootDSP() and SNDRunDSP() functions. You create a SND_FORMAT_DSP_CORE sound by reading a DSP load file (extension ".lod") with the SNDReadDSPfile() function. -- SND_FORMAT_DSP_COMMANDS is used to distinguish sounds that contain DSP commands created by the Music Kit. Sounds in this format can only be created through the Music Kit's Orchestra class, but can be played back through the SNDStartPlaying() function. -- SND_FORMAT_DISPLAY format is used by the Sound Kit's SoundView class. Such sounds can't be played. -- SND_FORMAT_INDIRECT indicates data that has become fragmented, as described in a separate section, below. -- SND_FORMAT_UNSPECIFIED is used for unrecognized formats. Fragmented Sound Data Sound data is usually stored in a contiguous block of memory. However, when sampled sound data is edited (such that a portion of the sound is deleted or a portion inserted), the data may become discontiguous, or fragmented. Each fragment of data is given its own SNDSoundStruct header; thus, each fragment becomes a separate SNDSoundStruct structure. The addresses of these new structures are collected into a contiguous, NULL-terminated block; the dataLocation field of the original SNDSoundStruct is set to the address of this block, while the original format, sampling rate, and channel count are copied into the new SNDSoundStructs. Fragmentation serves one purpose: It avoids the high cost of moving data when the sound is edited. Playback of a fragmented sound is transparent-you never need to know whether the sound is fragmented before playing it. However, playback of a heavily fragmented sound is less efficient than that of a contiguous sound. The SNDCompactSamples() C function can be used to compact fragmented sound data. Sampled sound data is naturally unfragmented. A sound that's freshly recorded or retrieved from a soundfile, the Mach-O segment, or the pasteboard won't be fragmented. Keep in mind that only sampled data can become fragmented. _________________________ >From mentor.cc.purdue.edu!purdue!decwrl!ucbvax!ziploc!eps Wed Apr 4 23:56:23 EST 1990 Article 5779 of comp.sys.next: Path: mentor.cc.purdue.edu!purdue!decwrl!ucbvax!ziploc!eps >From: eps@toaster.SFSU.EDU (Eric P. Scott) Newsgroups: comp.sys.next Subject: Re: Format of NeXT sndfile headers? Message-ID: <445@toaster.SFSU.EDU> Date: 31 Mar 90 21:36:17 GMT References: <14978@phoenix.Princeton.EDU> Reply-To: eps@cs.SFSU.EDU (Eric P. Scott) Organization: San Francisco State University Lines: 42 In article <14978@phoenix.Princeton.EDU> bskendig@phoenix.Princeton.EDU (Brian Kendig) writes: >I'd like to take a program I have that converts Macintosh sound files >to NeXT sndfiles and polish it up a bit to go the other direction as >well. Two people have already submitted programs that do this (Christopher Lane and Robert Hood); check the various NeXT archive sites. > Could someone please give me the format of a NeXT sndfile >header? "big-endian" 0 1 2 3 +-------+-------+-------+-------+ 0 | 0x2e | 0x73 | 0x6e | 0x64 | "magic" number +-------+-------+-------+-------+ 4 | | data location +-------+-------+-------+-------+ 8 | | data size +-------+-------+-------+-------+ 12 | | data format (enum) +-------+-------+-------+-------+ 16 | | sampling rate (int) +-------+-------+-------+-------+ 20 | | channel count +-------+-------+-------+-------+ 24 | | | | | (optional) info string 28 = minimum value for data location data format values can be found in /usr/include/sound/soundstruct.h Most common combinations: sampling channel data rate count format voice file 8012 1 1 = 8-bit mu-law system beep 22050 2 3 = 16-bit linear CD-quality 44100 2 3 = 16-bit linear -=EPS=- ------------------------------------------------------------------------ IFF/8SVX Format --------------- Newsgroups: alt.binaries.sounds.d,alt.sex.sounds Subject: Format of the IFF header (Amiga sounds) Message-ID: <2509@tardis.Tymnet.COM> From: jms@tardis.Tymnet.COM (Joe Smith) Date: 23 Oct 91 23:54:38 GMT Followup-To: alt.binaries.sounds.d Organization: BT North America (Tymnet) The first 12 bytes of an IFF file are used to distinguish between an Amiga picture (FORM-ILBM), an Amiga sound sample (FORM-8SVX), or other file conforming to the IFF specification. The middle 4 bytes is the count of bytes that follow the "FORM" and byte count longwords. (Numbers are stored in M68000 form, high order byte first.) ------------------------------------------ FutureSound audio file, 15000 samples at 10.000KHz, file is 15048 bytes long. 0000: 464F524D 00003AC0 38535658 56484452 FORM..:.8SVXVHDR F O R M 15040 8 S V X V H D R 0010: 00000014 00003A98 00000000 00000000 ......:......... 20 15000 0 0 0020: 27100100 00010000 424F4459 00003A98 '.......BODY..:. 10000 1 0 1.0 B O D Y 15000 0000000..03 = "FORM", identifies this as an IFF format file. FORM+00..03 (ULONG) = number of bytes that follow. (Unsigned long int.) FORM+03..07 = "8SVX", identifies this as an 8-bit sampled voice. ????+00..03 = "VHDR", Voice8Header, describes the parameters for the BODY. VHDR+00..03 (ULONG) = number of bytes to follow. VHDR+04..07 (ULONG) = samples in the high octave 1-shot part. VHDR+08..0B (ULONG) = samples in the high octave repeat part. VHDR+0C..0F (ULONG) = samples per cycle in high octave (if repeating), else 0. VHDR+10..11 (UWORD) = samples per second. (Unsigned 16-bit quantity.) VHDR+12 (UBYTE) = number of octaves of waveforms in sample. VHDR+13 (UBYTE) = data compression (0=none, 1=Fibonacci-delta encoding). VHDR+14..17 (FIXED) = volume. (The number 65536 means 1.0 or full volume.) ????+00..03 = "BODY", identifies the start of the audio data. BODY+00..03 (ULONG) = number of bytes to follow. BODY+04..NNNNN = Data, signed bytes, from -128 to +127. 0030: 04030201 02030303 04050605 05060605 0040: 06080806 07060505 04020202 01FF0000 0050: 00000000 FF00FFFF FFFEFDFD FDFEFFFF 0060: FDFDFF00 00FFFFFF 00000000 00FFFF00 0070: 00000000 00FF0000 00FFFEFF 00000000 0080: 00010000 000101FF FF0000FE FEFFFFFE 0090: FDFDFEFD FDFFFFFC FDFEFDFD FEFFFEFE 00A0: FFFEFEFE FEFEFEFF FFFFFEFF 00FFFF01 This small section of the audio sample shows the number ranging from -5 (0xFD) to +8 (0x08). Warning: Do not assume that the BODY starts 48 bytes into the file. In addition to "VHDR", chunks labeled "NAME", "AUTH", "ANNO", or "(c) " may be present, and may be in any order. You will have to check the byte count in each chunk to determine how many bytes to skip. ------------------------------------------------------------------------ Playing sound on a PC --------------------- [From Mr. Neat-O] Any turbo PC (8088 at 8 Mhz or greater)/286/386/486/etc. can produce a quality playback of single channel 8 bit sounds on the internal (1 bit, 1 channel) speaker by utilizing Pulse-Width-Modulation, which toggles the speaker faster than it can physically move to simulate positions between fully on and fully off. There are several PD programs of this nature that I know of: REMAC - Plays MAC format sound files. Files on the Macintosh, at least the sound files that I've ripped apart, seem to contain 3 parts. The first two are info like what the file icon looks like and other header type info. The third part contains the raw sample data, and it is this portion of the file which is saved to a seperate file, often named with the .snd extension by PC users. Personally, I like to name the files .s1, .s2, .s3, or .s4 to indicate the sampling rate of the file. (-s# is how to specify the playback rate in REMAC.) REMAC provides playback rates of 5550hz, 7333hz, 11 khz, & 22 khz. REMAC2 - Same as REMAC, but sounds better on higher speed machines. REPLAY - Basically same as REMAC, but for playback of Atari ST sounds. Apparently, the Atari has two sound formats, one of which sounds like garbage if played by REMAC or REPLAY in the incorrect mode. The other file format works fine with REMAC and so appears to be 'normal' unsigned 8-bit data. REPLAY provides playback rates of 11.5 khz, 12.5 khz, 14 khz, 16 khz, 18.5 khz, 22khz, & 27 khz. These three programs are all by the same author, Richard E. Zobell who does not have an internet mail address to my knowledge, but does have a GEnie email address of R.ZOBELL. Additionally, there are various stand-alone demos which use the internal speaker, of which there is one called mushroom which plays a 30 second advertising jingle for magic mushroom room deoderizers which is pretty humerous. I've used this player to playback samples that I ripped out of the commercial game program Mean Streets, which uses something they call RealSound (tm) to playback digital samples on the internal speaker. (Of course, I only do this on my own system, and since I own the game, I see no problems with it.) For owners of 8 Mhz 286's and above, the option to play 4 channel 8 bit sounds (with decent quality) on the internal speaker is also a reality. Quite a number of PD programs exist to do this, including, but not limited to: ModEdit, ModPlay, ScreamTracker, STM, Star Trekker, Tetra, and probably a few more. All these programs basically make use of various sound formats used by the Amiga line of computers. These include .stm files, .mod files, and .nst files. Also, these programs pretty much all have the option to playback the sound to add-on hardware such as the SoundBlaster card, the Covox series of devices, and also to direct the data to either one or two (for stereo) parallel ports, which you could attach your own D/A's to. (From what I have seen, the Covox is basically an small amplified speaker with a D/A which plugs into the parallel port. This sounds very similiar to the Disney Sound System (DSS) which people have been talking about recently.) ------------------------------------------------------------------------ The EA-IFF-85 documentation. dgc3@midway.uchicago.edu writes: As promised, here's an ftp location for the EA-IFF-85 documentation. It's the November 1988 release as revised by Commodore (the last public release), with specifications for IFF FORMs for graphics, sound, formatted text, and more. IFF FORMS now exist for other media, including structured drawing, and new documentation is now available only from Commodore. The documentation is at grind.isca.uiowa.edu [128.255.19.233], in the directory /amiga/ff/f2/ff185. The complete file list is as follows: DOCUMENTS.zoo EXAMPLES.zoo EXECUTABLE.zoo INCLUDE.zoo LINKER_INFO.zoo OBJECT.zoo SOURCE.zoo TP_IFF_Specs.zoo All files except DOCUMENTS.zoo are Amiga-specific, but may be used as a basis for conversion to other platforms. Well, I take that tentatively back. I don't know what TP_IFF_Specs.zoo contains, so it might be non-Amiga-specific. ------------------------------------------------------------------------