Steganography : Weaknesses in Current Software

----------------------------------------------------------------------------
               A WORK IN PROGRESS : Please keep checking back
----------------------------------------------------------------------------

                       [ Paradigm | S-Tools | Jsteg ]

Paradigm

The goal of steganography is to add a message m to a carrier M in such a way
that it is unfeasible for an attacker to tell, from the messages M and M+m,
which is which. A secondary, oft-quoted, aim is that of deniability.
("Message? What message? How did I know that was there?") In my opinion,
this can most easily be achieved by satisfying the requirements given
earlier.

The scenario to envisage is of an attacker with a statistical model of
typical messages M. Each message M' available to him is then tested against
this model to see if it fits; if a sufficient number don't, then
steganography is suspected, and he is likely to launch a denial of service
attack, or even resort to rubber hose cryptanalysis...

In a personal communication, Andy Brown (the author of S-Tools) states his
belief that steganography is only of use in not arousing suspicion in the
first place; once suspicicion is aroused, all the steganographer can hope to
achieve is make the message irretrievable without the correct key.

Personally, I believe that once suspicicion is aroused, the steganographer
has lost, for to "make the message irretrievable without the correct key" is
the aim of conventional cryptography, which, for some reason, the
steganographer has avoided relying solely upon. And the theoretical basis of
conventional cryptography is much better understood than the snake-oil of
steganography...

At this stage, I would also like to differentiate between the steganographic
application, and the algorithm upon which it is based. Whilst the bells and
whistles of a good application may encourage frequent use, and even give
peace of mind to non-experts, if the underlying algorithm M+m is poor, then
those same non-experts have been lulled into a false sense of security. For
while pre-processing steps such as compression and encryption may be
valuable tools, they are unable to rescue a weak algorith.

S-Tools

8-Bit Images

I have a short program which investigates the palette P of palette-mapped
images. Amongst other things, it calculates cover(P,N), the number of NxNxN
boxes needed to cover all the palette entries in P. (In fact, the general
problem is in NP, so I actually calculate an upper bound.) Clearly, we would
expect cover(P,1)=|P|. If cover(P,1)<|P| then two palette entries are the
same; the only purpose for this is steganography.

Now, although it does theoretically depend upon the image and the
quantisation algorithm, in practice, every 8-bit (colour) image without an
injected message also has cover(P,2)=|P|. Unfortunately, with images
generated by S-Tools, cover(P,2) is very low.

For example, if |P|=256 (ie. a 256-colour image), then after injecting a
short message (1Kb) into a typical image (800x600) cover(P,2)<50. For a
larger message (10 Kb) into the same image, f(P,2)~=32.

In fact, knowing only cover(P,2) and the image size, it is even possible to
estimate the size of the injected message!

It is clear that the algorithm that S-Tools uses to inject messages into
8-bit images seriously distorts the statistical distribution of the image
from that which is expected, and in a way that can be easily detected.

Jsteg

In no particular order :

   * According to the documentation, The lowest-order bits of all non-zero
     frequency coefficients are replaced with successive bits from the
     steganographic source file, and these modified coefficients are sent to
     the Huffmann coder.

     As a result, the message (and hence added noise) is concentrated in the
     earlier blocks, the top of the image, rather than being spread evenly
     throughout.

   * Zero-frequency coefficients are untouched, as for "typical" images and
     quantisation tables, at least three quarters of the coefficients will
     be zero. Halving this would be easily noticeable, particularly as later
     blocks would have a "standard" distribution of values.

     As the number of non-zero-frequency coefficients is not known until
     compression has been performed, the only way to tell in general whether
     a given message will fit into a given image is to "suck it and see."

   * short inject(short inval)
     {
       short inbit;
       if ((inval & 0x1) != inval)
         if (use_inject)
           {
             if ((inbit=bitgetbit()) != -1)
               {
                 inval &= ~0x1;
                 inval |= inbit;
               }
             else
               use_inject = 0;
           }
       return inval;
     }

     To avoid possible synchronisation problems, the code maps the following
     sets of coefficient-values onto themselves : ... {-4,-3}, {-2,-1},
     {+2,+3}, {+4,+5}, ... (One-frequency, as well as zero-frequency,
     coefficients are untouched.)

     Note that, unlike the distributions of "almost all" AC-coefficients for
     "typical" images, these transforms are not symmetric about zero. As a
     result, the coefficient distributions become skewed, resulting in files
     which are easily flagged as containing extra data.

     However, this problem is easily fixed by mapping different sets of
     coefficient-values onto themselves : ... {-4,-3}, {-2,-1}, {+1,+2},
     {+3,+4}, ...

   * The distribution of coefficients is roughly Laplacian. If we define a
     function N[X,Y] to be the number of times coefficient X has value Y,
     then for "typical" images I, and "almost all" interesting X,

         N[X,0] >> N[X,1] ~= N[X,-1] >> N[X,2] ~= N[X,-2] >> N[X,3] ...

     Unfortunately, if we assume for the moment that approximately half the
     message bits are zeroes and half are ones, then (with the fix given
     above), an image I' with a hidden message has, for "almost all" X,

        N[X,0] = N'[X,0] >> N'[X,1] ~= N'[X,-1] ~= N'[X,2] ~= N'[X,-2] >>
                                   N'[X,3] ...

     i.e. There is a very noticeable step function in earlier blocks,
     resulting in files which are easily flagged as containing extra data.

     Although it is possible to skew the distribution of message bits so
     that, for some A,

                  N[A,y] = N'[A,y] for y in {-2, -1, 0, +1, +2}

     the distribution would still be distorted for |y| > 2 as, for all X,

                    N[X,1] / N[X,2] != N[X,2] / N[X,3] != ...

     Furthermore, as for all coefficients A there exists a coefficient B
     such that

                    N[A,1] / N[A,2] != N[B,1] / N[B,2] != ...

     the distributions of other coefficients would still be distorted in the
     centre.

   * According to the documentation, There is still a danger in that the
     sixth bit of the stream will always be one; this is solved by tacking
     an extra zero onto the beginning of the primary length field in half
     the cases.

     Surely a simpler solution is simply not to transmit this bit at all,
     resulting in headers 1.5 bits shorter, on average.

Aside from minor implementation niggles, it is clear that the algorithm that
Jsteg uses to inject messages seriously distorts the statistical
distribution of the image from that which is expected, and in a way that can
be easily detected.

----------------------------------------------------------------------------
Bob Tinsley tinsley@dcs.warwick.ac.uk