Hi there, Google have sponsored me to perform a security audit of gzip-1.3.5, in which multiple security vulnerabilities have been uncovered. These flaws could be leveraged by an attacker to compromise or disrupt any automated system relying on gzip for data decompression. These may also affect applications that execute gzip indirectly, such as tar, lynx, gv, xli, vim, less, etc. A stack modification vulnerability (where a stack buffer can be modified out of bounds, but not in the traditional stack overrun sense) exists in the LZH decompression support, The following loop from make_table() (~139, unlzh.c) assumes that no entry in bitlen[] can exceed 16: for (i = 0; i < nchar; i++) count[bitlen[i]]++; This is not the case, bitlen[] can be populated with higher values by read_pt_len(), thus incrementing values outside the bounds of the stack buffer count[]. A datastream consisting entirely of huffman codes with set bits demonstrates this, for example: $ perl -e 'print "\x1f\xa0","\xab\xcd","\xff"x"2048"' | gzip -d Please note, this may or may not cause odd behaviour, a debugger should be used to find out if you are affected (this does not appear to be detected by valgrind). This vulnerability may or may not be exploitable, different behaviour has been observed on different systems. On some systems, several saved registers are within reach, thus allowing them to be incremented by a significant amount (count[] is of type unsigned short[], allowing you to manipulate the 2 MSB and LSB of a saved dword register independently). This may be enough to move a stack frame into an attacker controlled area, adjust the return address, or should a register jump be performed on a saved register (eg, ljmp ebx), then an attacker may be able to move the destination to another controlled area. If no stack data can be reached, the impact of this vulnerability is low. This can be found in gdb as follows: $ gdb -q gzip (gdb) b make_table Breakpoint 1 at 0x804f246: file unlzh.c, line 146. (gdb) r --decompress < testcase.Z Breakpoint 1, make_table (nchar=19, bitlen=0x80551c4 "\024\024\024", tablebits=8, table=0x8054fc0) at unlzh.c:146 (gdb) info frame Stack level 0, frame at 0xffffd5d4: eip = 0x804f246 in make_table (unlzh.c:146); saved eip 0x804f589 called by frame at 0xffffd5f4 source language c. Arglist at 0xffffd5cc, args: nchar=19, bitlen=0x80551c4 "\024\024\024", tablebits=8, table=0x8054fc0 Locals at 0xffffd5cc, Previous frame's sp is 0xffffd5d4 Saved registers: ebx at 0xffffd5c0, ebp at 0xffffd5cc, esi at 0xffffd5c4, edi at 0xffffd5c8, eip at 0xffffd5d0 (gdb) p/x &count[21] $1 = 0xffffd5c8 In this case, the saved ebx, esi and edi are clearly attacker controllable. A .bss buffer underflow exists in gzip's pack support, where the following loop from build_tree() (unpack.c, ~146) does not enforce any lower bound while constructing the prefix table: while (prefixes--) *--prefixp = (uch)len; The simplified process of constructing a prefix table is as follows: * Read the maximum length of a huffman code used in this archive. * Ensure the maximum length is between zero and 25. * Read the number of leaves at each code length from the archive. * Check the sum of leaves does not exceed 256. * For each leaf count between code lengths 1 .. min(max_code_length, 12) initialise the prefix table to the code length. The prefix table could theoretically contain 1<<12 entries safely, however a leaf count table could be constructed in such a way as to write to index (1<<12 - (0xff << (12 - 1))), or -518144 (this is the furthest index directly reachable), thus underflowing the buffer considerably. * The values written to the underflowed area are attacker controlled, but can only be within the range 0x01 to 0x0c. * The distance from the buffer is affected by the value of the char requested, the formula for the furthest index reachable by character value n is something like (4096 - (0xff << 12 - n)). * The overwrite operation can only occur once, however the condition can be be easily modified via the first overwrite, and thus repeated multiple times. Overwriting a buffer with multiple values is possible by building up a new value using multiple overlapping writes. On big endian systems, this vulnerability should be trivially exploitable. However exploitation on intel appears to be considerably more difficult, the most likely attack vector appears to be modifying max_len, peek_bits, eob (indirectly) and lit_base in such a way as to trigger an write of arbitrary data via put_ubyte() and window[], this can then be used to modify the `work` function pointer or a .got entry (such as free(), which is called on error) to point at an attacker controlled buffer such as inbuf. Alternative attack vectors may include modifying ifd, ofd, infinite loops, etc. Please note, that on systems that compile gzip with `DYN_ALLOC` defined, the buffer underflowed is a heap buffer. I have not investigated this configuration in any detail. The file gzip_pack_underflow.c attached to this mail can be used to generate archives that demonstrate this vulnerability. A .bss buffer overflow vulnerability exists in gzip's LZH support, due to it's inability to handle exceptional input in the make_table() function, a pathological decoding table can be constructed in such a way as to generate counts so high that the rapid growth of `nextcode` exceeds the size of the table[] buffer. The decoding table construction code is considerably more complex than that of pack's. To exploit this vulnerability, an attacker would need to: * Construct a pt_len[] such that pt_len[n] is 0. * Construct a pt_table[] such that pt_table[(code buffer) >> 16 - 8] is n (where n>2) * Now c_len[] is filled with (n-2), generating exceptionally high values in count[n-2]. The most likely targets for triggering the exploitation of arbitrary code would be inptr, insize and inbuf, all of which are fully controllable, and triggering a buffer refill operation with these modified variables. A datatream that demonstrates a pathological c_len[] can be generated as follows: $ perl -e 'print "\x1f\xa0","\xab\xcd","\xf6\x40\x01\xc2\xcc\x36\x0c\x92\x00\x00\x00\x00","\xc8","\x00"x"2048"' | gzip -d Where the third string contains codes to populate pt_len[], which in turn is used to generate the c_len[] in read_c_len(). Please note, this may not crash, you should use a debugger to identify if an overflow has occurred (valgrind doesnt detect this either). If you compile with -funit-at-a-time, you can put a watchpoint on `foreground`, making this easier to debug, which is what I did while testing this. eg, $ gdb -q ./gzip (gdb) thb unlzh Breakpoint 1 at 0x804fe49: file unlzh.c, line 425. (gdb) r --decompress < testcase.gz unlzh (in=0, out=1) at unlzh.c:425 (gdb) p/x foreground $1 = 0x1 (gdb) watch foreground != 1 Hardware watchpoint 2: foreground != 1 (gdb) c Hardware watchpoint 2: foreground != 1 Old value = 0 New value = 1 0x0804fb49 in make_table (nchar=510, bitlen=0x80916e0 '\005' ..., tablebits=12, table=0x8060e00) at unlzh.c:214 (gdb) p/x foreground $2 = 0x100 clearly foreground has been damaged here. (gdb) info symbol &table[i] foreground in section .bss oops, table[i] has reached outside d_buf. The public domain source on which unlzh.c is based on appears to be used in multiple other decompressors, I have not investigated if these are vulnerable to the same attack. The following code sequence is used in multiple locations within unlzh.c for traversing the branches of a tree structure: do { if (bitbuf & mask) j = right[j]; else j = left [j]; mask >>= 1; } while (j >= NC); In this case, if mask is 0 and j == left[j], then the loop will continue forever, perhaps disrupting the operation of any automated systems relying on gzip for data decompression. The impact of this vulnerability is obviously a minor DoS. It does not appear to be possible to construct a tree such that (for example) left[1] == 2, left[2] == 1 and so on. Therefore, detecting this loop is relatively easy, adding the condition `&& (mask || j != left[j])` should be adequate. As some of these formats are hard to come by, I have attached some valid archives in the hope of helping with any regression testing that may be required (typically pack files are distributed with a lowercase .z file extension, LZH and LZW compress archives both use an uppercase .Z, and can only be differentiated by their magic). The testcases have been symmetrically encrypted to avoid inadvertently disrupting mua's, av scanners, etc. Please use `gpg --output gzip-testcases.tar.bz2 --decrypt gzip-testcases.tar.bz2.gpg` with password "google" to extract. If there are no objections, I'll suggest an embargo date of September 5th. I'll forward this date and report to the upstream authors, although experience suggests they may be unresponsive. Please credit "Tavis Ormandy, Google Security Team" in any advisories relating to this issue. Thanks, Tavis