Compressing a file using the Huffman code table

Chapter: Compressing a file using the Huffman code table

In the previous chapter, you wrote a program to generate a huffman table to be used for compressing text files. In case you had difficulties, you may use my solution for the next part of the lab. Notice how each character has its own binary code. Infrequently occurring letters like 'q' have long codes like 01111011110, whereas common letters like 'a' have short codes like 0010. The most common character is probably the space (code 10). Notice that we don't have an encoding for the line feed - carriage return because our scanner ate those. We will discuss this in a later exercise.

We can now encode an entire file. For each character, we write the {0,1} string encoding it.


Exercise 4

Write a program to compress a file using a huffman table. My program uses a huffman table stored in a file named args[0] to compress a file named in args[1] writing the compressed data to a file named in args[2]. So
java Compress myHuffmanTable.txt hamlet.txt hamlet.compressed
uses the huffman table in myHuffmanTable.txt to compress hamlet.txt into a (presumably) new file hamlet.compressed.

My compressed version of hamlet begins:

110101111101011001100100101101011111010101011010000000111011
011001000001110011101010011100110011000010000110110101101101
000001111010011010001001100100110101110111001001100101011010
101000011110111101100110011110110010010011001100001000011000
011100101100100011110110011110100110100001101010100111101110
011110001110110000100011110001101011011110000100000010110101
011111010010000100111001111111010011100001011011110000011100
111011001000000001001011110011111100110010101100110111111101
001111000101100101001100111101101001000001001101110000001111
110111001001111011101110011110000101100100001111100001110011
100111100000000101101100010010111111010011110100001011011001
110110001101011010000111111011100100110011000111100111111100
001110110011011011111101011010110001000000011011100010101110
010111110100111101001010100100101111011100100011110100110110
000011001101101110010001011110001101011110110011010011110111
000110001001010100100101111011100101101001110111100111111100
111100101100001001111001111000010101111110100110010100100011
010100111101110000001111110111001011010001001001101100110100
110011111110100100000001111001100110110110100101110011110011
111101011010110010011011001001011110001000111001000111111011
111001100111100011010010101001011010110001000000011011100010
101111110101101000100010110011011010111001111101110010111110
1001111010
but yours may well differ. If you used my Huffman table you should have the same compressed data as mine. Most of my program is here and you may use it if you wish.


rhyspj@gwu.edu