Hi there, Google have sponsored me to perform a security audit of
gzip-1.3.5, in which multiple security vulnerabilities have been
uncovered. These flaws could be leveraged by an attacker to compromise
or disrupt any automated system relying on gzip for data
decompression. These may also affect applications that execute gzip
indirectly, such as tar, lynx, gv, xli, vim, less, etc.

A stack modification vulnerability (where a stack buffer can be
modified out of bounds, but not in the traditional stack overrun
sense) exists in the LZH decompression support, The following loop
from make_table() (~139, unlzh.c) assumes that no entry in bitlen[]
can exceed 16:

for (i = 0; i < nchar; i++) count[bitlen[i]]++;

This is not the case, bitlen[] can be populated with higher values by
read_pt_len(), thus incrementing values outside the bounds of the
stack buffer count[]. A datastream consisting entirely of huffman
codes with set bits demonstrates this, for example:

$ perl -e 'print "\x1f\xa0","\xab\xcd","\xff"x"2048"' | gzip -d

Please note, this may or may not cause odd behaviour, a debugger
should be used to find out if you are affected (this does not appear
to be detected by valgrind).

This vulnerability may or may not be exploitable, different behaviour
has been observed on different systems. On some systems, several saved
registers are within reach, thus allowing them to be incremented by a
significant amount (count[] is of type unsigned short[], allowing you
to manipulate the 2 MSB and LSB of a saved dword register
independently). This may be enough to move a stack frame into an
attacker controlled area, adjust the return address, or should a
register jump be performed on a saved register (eg, ljmp ebx), then an
attacker may be able to move the destination to another controlled
area.

If no stack data can be reached, the impact of this vulnerability is
low. This can be found in gdb as follows:

$ gdb -q gzip
(gdb) b make_table
Breakpoint 1 at 0x804f246: file unlzh.c, line 146.
(gdb) r --decompress < testcase.Z
Breakpoint 1, make_table (nchar=19, bitlen=0x80551c4 "\024\024\024",
tablebits=8, table=0x8054fc0) at unlzh.c:146
(gdb) info frame
Stack level 0, frame at 0xffffd5d4:
 eip = 0x804f246 in make_table (unlzh.c:146); saved eip 0x804f589
 called by frame at 0xffffd5f4
 source language c.
 Arglist at 0xffffd5cc, args: nchar=19, bitlen=0x80551c4
"\024\024\024", tablebits=8, table=0x8054fc0
 Locals at 0xffffd5cc, Previous frame's sp is 0xffffd5d4
 Saved registers:
 ebx at 0xffffd5c0, ebp at 0xffffd5cc, esi at 0xffffd5c4, edi at
0xffffd5c8, eip at 0xffffd5d0
(gdb) p/x &count[21]
$1 = 0xffffd5c8

In this case, the saved ebx, esi and edi are clearly attacker controllable.

A .bss buffer underflow exists in gzip's pack support, where the
following loop from build_tree() (unpack.c, ~146) does not enforce any
lower bound while constructing the prefix table:

while (prefixes--) *--prefixp = (uch)len;

The simplified process of constructing a prefix table is as follows:

 * Read the maximum length of a huffman code used in this archive.
    * Ensure the maximum length is between zero and 25.
 * Read the number of leaves at each code length from the archive.
    * Check the sum of leaves does not exceed 256.
 * For each leaf count between code lengths 1 .. min(max_code_length, 12)
   initialise the prefix table to the code length.

The prefix table could theoretically contain 1<<12 entries safely,
however a leaf count table could be constructed in such a way as to
write to index (1<<12 - (0xff << (12 - 1))), or -518144 (this is the
furthest index directly reachable), thus underflowing the buffer
considerably.

* The values written to the underflowed area are attacker controlled,
 but can only be within the range 0x01 to 0x0c.
* The distance from the buffer is affected by the value of the char requested,
 the formula for the furthest index reachable by character value n is something
 like (4096 - (0xff << 12 - n)).
* The overwrite operation can only occur once, however the condition can be
 be easily modified via the first overwrite, and thus repeated multiple times.

Overwriting a buffer with multiple values is possible by building up a
new value using multiple overlapping writes.

On big endian systems, this vulnerability should be trivially
exploitable. However exploitation on intel appears to be considerably
more difficult, the most likely attack vector appears to be modifying
max_len, peek_bits, eob (indirectly) and lit_base in such a way as to
trigger an write of arbitrary data via put_ubyte() and window[], this
can then be used to modify the `work` function pointer or a .got entry
(such as free(), which is called on error) to point at an attacker
controlled buffer such as inbuf.

Alternative attack vectors may include modifying ifd, ofd, infinite loops, etc.

Please note, that on systems that compile gzip with `DYN_ALLOC`
defined, the buffer underflowed is a heap buffer. I have not
investigated this configuration in any detail.

The file gzip_pack_underflow.c attached to this mail can be used to
generate archives that demonstrate this vulnerability.

A .bss buffer overflow vulnerability exists in gzip's LZH support, due
to it's inability to handle exceptional input in the make_table()
function, a pathological decoding table can be constructed in such a
way as to generate counts so high that the rapid growth of `nextcode`
exceeds the size of the table[] buffer.

The decoding table construction code is considerably more complex than
that of pack's.

To exploit this vulnerability, an attacker would need to:

 * Construct a pt_len[] such that pt_len[n] is 0.
 * Construct a pt_table[] such that pt_table[(code buffer) >> 16 - 8]
is n (where n>2)
 * Now c_len[] is filled with (n-2), generating exceptionally high values in
   count[n-2].

The most likely targets for triggering the exploitation of arbitrary
code would be inptr, insize and inbuf, all of which are fully
controllable, and triggering a buffer refill operation with these
modified variables.

A datatream that demonstrates a pathological c_len[] can be generated
as follows:

$ perl -e 'print
"\x1f\xa0","\xab\xcd","\xf6\x40\x01\xc2\xcc\x36\x0c\x92\x00\x00\x00\x00","\xc8","\x00"x"2048"'
| gzip -d

Where the third string contains codes to populate pt_len[], which in
turn is used to generate the c_len[] in read_c_len().

Please note, this may not crash, you should use a debugger to identify
if an overflow has occurred (valgrind doesnt detect this either). If
you compile with -funit-at-a-time, you can put a watchpoint on
`foreground`, making this easier to debug, which is what I did while
testing this. eg,

$ gdb -q ./gzip
(gdb) thb unlzh
Breakpoint 1 at 0x804fe49: file unlzh.c, line 425.
(gdb) r --decompress < testcase.gz
unlzh (in=0, out=1) at unlzh.c:425
(gdb) p/x foreground
$1 = 0x1
(gdb) watch foreground != 1
Hardware watchpoint 2: foreground != 1
(gdb) c
Hardware watchpoint 2: foreground != 1

Old value = 0
New value = 1
0x0804fb49 in make_table (nchar=510, bitlen=0x80916e0 '\005' <repeats
200 times>..., tablebits=12, table=0x8060e00) at unlzh.c:214
(gdb) p/x foreground
$2 = 0x100

clearly foreground has been damaged here.

(gdb) info symbol &table[i]
foreground in section .bss

oops, table[i] has reached outside d_buf.

The public domain source on which unlzh.c is based on appears to be
used in multiple other decompressors, I have not investigated if these
are vulnerable to the same attack.

The following code sequence is used in multiple locations within
unlzh.c for traversing the branches of a tree structure:

  do {
      if (bitbuf & mask) j = right[j];
      else               j = left [j];
      mask >>= 1;
  } while (j >= NC);

In this case, if mask is 0 and j == left[j], then the loop will
continue forever, perhaps disrupting the operation of any automated
systems relying on gzip for data decompression. The impact of this
vulnerability is obviously a minor DoS. It does not appear to be
possible to construct a tree such that (for example) left[1] == 2,
left[2] == 1 and so on. Therefore, detecting this loop is relatively
easy, adding the condition `&& (mask || j != left[j])` should be
adequate.

As some of these formats are hard to come by, I have attached some
valid archives in the hope of helping with any regression testing that
may be required (typically pack files are distributed with a lowercase
.z file extension, LZH and LZW compress archives both use an uppercase
.Z, and can only be differentiated by their magic). The testcases have
been symmetrically encrypted to avoid inadvertently disrupting mua's,
av scanners, etc. Please use `gpg --output gzip-testcases.tar.bz2
--decrypt gzip-testcases.tar.bz2.gpg` with password "google" to
extract.

If there are no objections, I'll suggest an embargo date of September
5th. I'll forward this date and report to the upstream authors,
although experience suggests they may be unresponsive.

Please credit "Tavis Ormandy, Google Security Team" in any advisories
relating to this issue.

Thanks, Tavis