A Quick Background on Perfect Rainbow Tables
Rainbow tables allow us to find plaintexts to cryptographic hash algorithms quickly. They are based off of Hellman Martin’s “A Cryptanalytic Time – Memory Trade-Off“. By including the step in the reduction function, Phillipe Oechslin was able to improve on Hellman Martin’s method, and we came up with Rainbow Tables, the “Faster Cryptanalytic Time – Memory Trade-Off“.
If you are unfamiliar with rainbow tables, it is suggested you become familiar before continuing.
Even rainbow tables are not perfect. They still merge, and merges mean wasted information, wasted space, wasted time. What we really want are perfect rainbow tables. In perfect rainbow tables, each chain has a unique endpoint. We have no merges. They give us nearly the same percentage to find a plaintext as non-perfect rainbow tables, but are much smaller.
Perfect rainbow tables have always taken much longer to create than non-perfect rainbow tables. The two step process to create perfect rainbow tables has been:
- Create all chains for your rainbow tables
- Eliminate duplicate chains, until only one instance for each endpoint remains
Each time you eliminate a duplicate chain, all the time spent creating that chain is lost/wasted.
We must also understand that chains can merge at any step. If we have three chains which merge, they may merge like this:
hash(aaaaa) -> 13049 ... reduce(13049, 1) -> bcdef ... hash(bcdef) -> 30459 ... reduce(30459, 2) -> iekdj ... hash(iekdj) -> 69384 hash(bbbbb) -> 93024 ... reduce(93024, 1) -> bcdef ... hash(bcdef) -> 30459 ... reduce(30459, 2) -> iekdj ... hash(iekdj) -> 69384 hash(ccccc) -> 84626 ... reduce(84626, 1) -> dgrhf ... hash(dgrhf) -> 84532 ... reduce(84532, 2) -> iekdj ... hash(iekdj) -> 69384
A Better Method for Creating Perfect Tables
Creating our perfect tables faster is as simple as identifying the merges in our chains before we reach the end of the chain. Instead of creating each chain one at a time, and then checking for merges after-the-fact, we create all of our chains out to a certain “merge step”, eliminate merges, and then continue until the end of the chain. If we repeat this process several times, we can identify progressively more merges before reaching then end of the chain. As we eliminate more merges, the time to get to our next “merge step” is less than the one before it.
Here is an illustration of this process in pseudo-code:
function extend_chains (chains, first_step, last_step) chains = extend_chains(chains, 0, 99) throwaway_merges(chains) chains = extend_chains(chains, 100, 199) throwaway_merges(chains) ... chains = extend_chains(chains, 900, 999) throwaway_merges(chains)
I have implemented my throwaway_merges function in a mergesort (seems fitting) which automatically discards duplicate chains during the sort process.
Performance
The performance of this method depends on the number of merges that will occur in our chains. The more merges, the larger the difference in time between the new method and the old method. As long as the time required to sort the chains and eliminate merges is less than the time gained from eliminating those merges early on, we receive a net gain in speed. Profiling my code, my implementation of throwaway_merges comprises less than 1% of the total computation time.
Just to reiterate, creating merges speeds up this process of creating rainbow tables. Using increasingly longer chains, while still detrimental in the actual cracking process, is no longer as detrimental in the generation of perfect rainbow tables.
I have generated some output from my new method of finding perfect chains compared with an implementation of the original method for finding perfect chains.
All runs were done on a single core of a Q6600 processor in Ubuntu-9.10-amd64 (each process on its own core). Disk write times are not accounted for, but are assumed negligible. The hash function being used is the MD4 implementation available here. I am currently generating MD4 hashes for ease of debugging (I check against md4 hashes generated through python’s hashlib), but these will be NT hashes by release (a simple conversion).
In the first run, we generate 10,000 chains, each chain with a length of 2048, where hashes reduce to plaintexts a-z 4 characters long. This table will have many merges, and will accent the speed improvement of this new method.
Of the 10000 chains, 405 unique chains remain. That’s only 4.05% of our original chains. My method finished creating its perfect chains in ~1.25 seconds, while the original method finished in ~7.21 seconds. The new method is ~5.8 times as fast.
In the second run, we generate 50,000 chains, each chain with a length of 2048, where hashes reduce to plaintexts a-z 5 characters long. This table will have fewer merges, and the improvement will not be as great.
Of the 50000 chains, 9349 unique chains remain. That’s 18.7% of our original chains. My method finished creating its perfect chains in ~15.96 seconds, while the original method finished in ~38.11 seconds. The new method is ~2.4 times as fast.
In the third run, we generate 500,000 chains, each chain with a length of 10240, where the hashes reduce to plaintexts of a-z 7 characters long. This table will have fewer merges, and the improvement should be minimal.
Of the 500,000 chains, 313005 unique chains remain. That’s 62.6% of our original chains. My method finished creating its perfect chains in ~1433 seconds, while the original method finished in ~1792 seconds. The new method is ~1.25 times as fast.
In the third run, we generate 50,000 chains, each chain with a length of 2048, where the hashes reduce to plaintexts of a-z 7 characters long. This table will almost no merges, and there should be no improvement with the new method.
Of the 50,000 chains, 49417 unique chains remain. That’s 98.8% of our original chains. My method finished creating its perfect chains in ~35.99 seconds, while the original method finished in ~36.00 seconds. The new method is ~1.00 times as fast.
Looking Forward
The entirety of this rainbow tables implementation was coded by me in C. The code requires some cleaning up, but once I am done I will be releasing a new, open source (exact license TBD) implementation of rainbow tables in C. The release will include both the original method of generating rainbow tables, and my new method for generating perfect rainbow tables.
Other things I would like to do:
- The chains are currently 20 bytes long. This can be brought down to 12 bytes, this has not been implemented yet.
- I would also like to experiment with a heap sort, where chains are added to the heap as soon as they are generated. The heap will automatically drop duplicate chains. When moving from one “merge step” to another, we just read from one heap and write to another. I am unsure how this will perform compared to my current implementation of merge sort. As the sorting algorithm accounts for only a small amount of time, this is low-priority.
- Assuming this new method will lead to longer chains being generated, implementing “break points” in the chains may be beneficial. Given a chain of length 10,000, we may also store a value which will allow us to pick up our chain at step 5,000. We would call this value a break point. This way, when cracking against the table, and we find a possible collision at, say, position 8,000, we do not have to generate the chain starting from step 0. We can start from step 5,000. This is not my idea, but I do have a link to the original idea. More breakpoints can be added to speed up cracking, at a loss of disk space.
Can you contact me? I’m currently using project-rainbowcrack’s rtgen to build a large number of rainbow tables for later perfection. Having a tool such as yours would be immensely helpful.
Link | April 23rd, 2010 at 15:23