Importing the 1.5 Billion password list into Mysql
by grifter - January 21, 2019 at 07:55 PM
#25
I recommend using the mysql partition feature and don't limit the unique primary key
#26
I'm guessing that using MySQL to search for specific users or passwords is a lot quicker than just searching text files?
#27
Good work sir, fighting the good fight!
#28
So i have had some time away (work trips) and am now back to playing around with these files.

I downloaded the Rocktastic collection which is a set of 620,104,640 passwords that a team has created some type of intelligent algarithms to create.
https://labs.nettitude.com/blog/rocktastic/

Rocktastics format is the same as the 1.5 billion password list so I used the same script to import it into the DB. (took a few days as this list is very large)

Now for the fun part, lets see how many of the Rocktastic passwords are actually in the 1.5 billion export of real accounts.  This will tell us how good the Rocktastic list really is.

First, I deduplicate the 1.5 billion account list down to the distinctly different passwords, I now have a table containing 434,8236,03 different passwords.

Then using this MYSQL statement and 14273.907 seconds of CPU time to compare the two tables together and create a new table that contains passwords that don't match.
CREATE TABLE UnmatchedRocktatsic AS
SELECT *
FROM breachexport.deduprocktastic AS a
WHERE NOT EXISTS (
 SELECT *
 FROM breachexport.deduppass AS b
 WHERE a.NTLM=b.NTLM
)

This returns a table with 604,910,275 rocktastic entries that were never found in the 1.5billion export.  So that means that 97.55% of the passwords that are in the Rocktastic list were never used by over 1.5 billion people.... so... we now know the Rocktastic password list is shit!

I have created a new script that spiders through the torrent files from this site, uncompresses one archive to a folder, opens up each text document, reads 20 sample entries, inspects them to figure out the files format (column names and data types).  It then creates a new database with the same name as the filename and names the columns according to the data type.  Once it has read in all the data it deletes the temp files and moves on to the next archive.  Once all the archives are done it starts hashing each entry in each DB.

I expect that the script would run for a few weeks on a high-end PC.  But once its done you would have an easy/fast way to search and run reports.
#29
(February 27, 2019 at 11:33 PM)grifter Wrote: This returns a table with 604,910,275 rocktastic entries that were never found in the 1.5billion export.  So that means that 97.55% of the passwords that are in the Rocktastic list were never used by over 1.5 billion people.... so... we now know the Rocktastic password list is shit!

I have created a new script that spiders through the torrent files from this site, uncompresses one archive to a folder, opens up each text document, reads 20 sample entries, inspects them to figure out the files format (column names and data types).  It then creates a new database with the same name as the filename and names the columns according to the data type.  Once it has read in all the data it deletes the temp files and moves on to the next archive.  Once all the archives are done it starts hashing each entry in each DB.

I expect that the script would run for a few weeks on a high-end PC.  But once its done you would have an easy/fast way to search and run reports.

The important metric here is how many Rocktastic passwords WERE used in the 1.5 billion dump, not how many weren't. If it managed to capture a high percentage of what people actually use then it doesn't really matter how many extra it had since it shows that the algorithm is good at generating realistic passwords. That 97.55% would just account for future passwords not in use (or not in the breach). If it guessed a low percentage of the breach then it truly is crap since what is guesses doesn't accurately represent reality. However, language and geographic region is also a factor since it's unlikely that people in China are using passwords anywhere similar to those in UK, for example.
#30
Thanks for taking the time to do this. I've not got a clue about these sorts of things.

Possibly Related Threads…
Thread Author Replies Views Last Post
What would you recommend to someone getting into computers, and anything involving it Damian9303 5 89 6 hours ago
Last Post: Sunsingerlock
Anyone into esports? lopezmania 9 278 December 31, 2019 at 04:07 AM
Last Post: hlafo
where should i invest my 10$ btc into? TRASHR 15 555 September 19, 2019 at 06:08 AM
Last Post: CombatLogs

 Users browsing this thread: 1 Guest(s)