jpegextractor - extract embedded JPEG streams from arbitrary files

Current version: 1.0

This is the homepage of jpegextractor, a command line tool to extract JPEG streams from arbitrary files or standard input.

Several file formats can include images as JPEG streams, e.g. PDF document files or ACDSee image database thumbnail files (image_db.dtf). In order to get to those JPEGs, it is necessary to either have a program that knows the file format and can extract the JPEGs from the right places, or to use a hex editor and copy binary data "manually".

jpegextractor uses the fact that valid binary JPEG streams start with the byte sequence ff d8 ff and end with the byte sequence ff d9. It copies all of those streams to new files. As jpegextractor simply looks for the two sequences it does not have to know the format of the encapsulating file and thus works with all formats that embed JPEG streams.

Switches

Call the program with --help as single parameter and you will get the following help screen:

Usage: java jpegextractor <OPTIONS> [FILEs]
Extract embedded JPEG streams from arbitrary files or standard input.

Options:
        -H, --help                 Print this help screen and terminate.
        -d, --digits NUM           Pad numbers in output files to NUM digits.
        -D, --outputdirectory DIR  Write to directory DIR (default: ".").
        -p, --prefix P             Use P as output prefix (default: "output").
        -s, --suffix S             Use S as output suffix (default: ".jpg").
        -n, --initialnumber NUM    Use NUM as initial output number (default: 0).
        -o, --overwrite            Overwrite existing output files.
        -q, --quiet                Nothing is written to standard output.

Copyright (C) 2002 Marco Schmidt <marcoschmidt@users.sourceforge.net>
Homepage http://www.geocities.com/marcoschmidt.geo/jpeg-extractor.html

This program is distributed under the GNU Lesser General Public
License 2.1. See http://www.gnu.org/copyleft/lesser.html for more.

Examples

The most simple call is to give the program the name of one (or several) files that it has to search for JPEG streams:

$ java jpegextractor document.pdf
 =>output0.jpg (217938 bytes)
 =>output1.jpg (15864 bytes)
 =>output2.jpg (18056 bytes)
 ... snipped some output
 =>output25.jpg (16911 bytes)
 =>output26.jpg (15432 bytes)
Extracted 27 JPEG file(s) with 607064 bytes from 1 input file(s).

This call lets the program read from standard input and forbids information being written to standard output. Images will be written to directory /images instead of the current directory. Existing files will be overwritten (by default, no file gets overwritten):

$ java jpegextractor -q -o -D /images < document.pdf

This call sets the prefix of output names to image (instead of output), the suffix to .jpeg (instead of .jpg), it lets the output numbers start at 433 (instead of 0) and forces these numbers to be at least five digits long (padding with leading zeroes as necessary):

$ java jpegextractor document.pdf -p image -s .jpeg -n 433 -d 5 
 =>image00433.jpeg (217938 bytes)
 ... snipped some output
 =>image00459.jpeg (15432 bytes)
Extracted 27 JPEG file(s) with 607064 bytes from 1 input file(s).

Requirements

jpegextractor requires Java 1.0.2 or higher.

License

jpegextractor is put under the GNU Lesser General Public License (LGPL) 2.1. In addition to its implications, if you use this code in your application, please mention this page in your documentation for others to find out about jpegextractor.

Changes

Download

Download source code and bytecode as a single ZIP archive: jpegextractor.zip (8 KB).

Please do not link directly to this ZIP archive because Geocities sometimes does not allow links from anything but a page hosted on Geocities. If you forbid your browser to include the refering page in HTTP requests, you might also get an error message.

Update notification

This class has a Freshmeat project entry. If you have a login (it's free), you can use the Subscribe to new releases link on that project page to be notified of new versions of jpegextractor.

Last modification 2002-02-01

1