TXT Processed US 236m records from 263GB Uncompressed data
by andyleung0927 - April 26, 2021 at 01:59 PM
#1
Information 
Hello RaidForums Community,

Thanks for @pompompurin, the original thread is  "https://raidforums.com/Thread-CSV-263GB-Leak-250-807-711-Total-records".  I select 1180 csv files which contain 'real' name and certain person from 280 csv files. And I choose some fields which I concern, then merge all the files into one txt file.
Compromised data: firstname,lastname,address,city,state,zip_code,email_1,land_phone,cellphone,geolocation,gender,birth_year,birth_month,occupation_code,ethnic_code,home_value,home_ownership,home_square_footage,home_dwelling_type_code,income_description,credit_capacity_description,marital_status_code,number_children_code,children_present_flag,email_2,email_3,email_4,email_5
contained lines: 236,000,000, 40.2GB (10 GB Compressed)
Sample:https://pastebin.com/piXfj9m9
2021/4/27 update content: Processing method by python, contain 3 *.py files, someone can modify the program easily to get the data you want.
2021/4/28 update content: adding gofile.io
2021/4/29 update content: update content: In order to support the data can be easily imported into mysql, reprocessed and updated the uploaded data and processing program.
Hidden Content
You must register or login to view this content.
Reply
#2
Quick message to mods: Don’t delete this for repost please, it’s reformatted & I was waiting for somebody to process the data like this, plus he deserves the credits for his work. Thanks 😁
Reply
#3
(April 26, 2021 at 03:46 PM)pompompurin Wrote: Quick message to mods: Don’t delete this for repost please, it’s reformatted & I was waiting for somebody to process the data like this, plus he deserves the credits for his work. Thanks 😁

Good point and I agree if all data is still included but its just parsed better I would suggest this one making official once the source of the data is identified.
Reply
#4
(April 26, 2021 at 03:48 PM)STARTEXMISLEAD Wrote:
(April 26, 2021 at 03:46 PM)pompompurin Wrote: Quick message to mods: Don’t delete this for repost please, it’s reformatted & I was waiting for somebody to process the data like this, plus he deserves the credits for his work. Thanks 😁

Good point and I agree if all data is still included but its just parsed better I would suggest this one making official once the source of the data is identified.
I’m gonna do a full data parse later today & add it to my thread as well, make it easier for people to use
Reply
#5
(April 26, 2021 at 03:50 PM)pompompurin Wrote:
(April 26, 2021 at 03:48 PM)STARTEXMISLEAD Wrote:
(April 26, 2021 at 03:46 PM)pompompurin Wrote: Quick message to mods: Don’t delete this for repost please, it’s reformatted & I was waiting for somebody to process the data like this, plus he deserves the credits for his work. Thanks 😁

Good point and I agree if all data is still included but its just parsed better I would suggest this one making official once the source of the data is identified.
I’m gonna do a full data parse later today & add it to my thread as well, make it easier for people to use

Good idea I just cant believe theres still no established source yet xd
Reply
#6
(April 26, 2021 at 03:52 PM)STARTEXMISLEAD Wrote:
(April 26, 2021 at 03:50 PM)pompompurin Wrote:
(April 26, 2021 at 03:48 PM)STARTEXMISLEAD Wrote:
(April 26, 2021 at 03:46 PM)pompompurin Wrote: Quick message to mods: Don’t delete this for repost please, it’s reformatted & I was waiting for somebody to process the data like this, plus he deserves the credits for his work. Thanks 😁

Good point and I agree if all data is still included but its just parsed better I would suggest this one making official once the source of the data is identified.
I’m gonna do a full data parse later today & add it to my thread as well, make it easier for people to use

Good idea I just cant believe theres still no established source yet xd

This data is AMAZINGLY neat, I tried the trick where you check for the [email protected], and there was no results. I also tried to do the trick where you grab all the email domains, and sort by the amount of occurrences, and there are NO emails with “test” in them, or ANY type of spam/testing domains. I’m going to do the grueling task of identifying the few thousands of domains used for emails, and hopefully find some domains that might be the source (assuming people that owned this system tested it with their company emails). They must’ve been whitelisting email domains & triple checking that emails exist before allowing people to continue. I am almost certain that this data is from some type of verifications.io//experian//other data on people.
Reply
#7
HOW





---------------------------------------------------------------------------------------
Reply
#8
This is really cool, thanks both andyleung0927 and pompompurin, the data is awesome. It was compremised a lot with so many files, this is great that now its 1 file.
I use EmEditor for these large files without any freeze problems.
This forum account is currently banned. Ban Length: Permanent (N/A).
Ban Reason: Posted beastiality in SB.
Reply
#9
(April 26, 2021 at 04:22 PM)pompompurin Wrote:
(April 26, 2021 at 03:52 PM)STARTEXMISLEAD Wrote:
(April 26, 2021 at 03:50 PM)pompompurin Wrote:
(April 26, 2021 at 03:48 PM)STARTEXMISLEAD Wrote:
(April 26, 2021 at 03:46 PM)pompompurin Wrote: Quick message to mods: Don’t delete this for repost please, it’s reformatted & I was waiting for somebody to process the data like this, plus he deserves the credits for his work. Thanks 😁

Good point and I agree if all data is still included but its just parsed better I would suggest this one making official once the source of the data is identified.
I’m gonna do a full data parse later today & add it to my thread as well, make it easier for people to use

Good idea I just cant believe theres still no established source yet xd

This data is AMAZINGLY neat, I tried the trick where you check for the [email protected], and there was no results. I also tried to do the trick where you grab all the email domains, and sort by the amount of occurrences, and there are NO emails with “test” in them, or ANY type of spam/testing domains. I’m going to do the grueling task of identifying the few thousands of domains used for emails, and hopefully find some domains that might be the source (assuming people that owned this system tested it with their company emails). They must’ve been whitelisting email domains & triple checking that emails exist before allowing people to continue. I am almost certain that this data is from some type of verifications.io//experian//other data on people.
There are three important question, where did the
 data  come from, what did some fields mean, and what's the meaning of some value like [1,3]..
Reply
#10
(April 26, 2021 at 05:05 PM)keepgoing07 Wrote:
(April 26, 2021 at 04:22 PM)pompompurin Wrote:
(April 26, 2021 at 03:52 PM)STARTEXMISLEAD Wrote:
(April 26, 2021 at 03:50 PM)pompompurin Wrote:
(April 26, 2021 at 03:48 PM)STARTEXMISLEAD Wrote: Good point and I agree if all data is still included but its just parsed better I would suggest this one making official once the source of the data is identified.
I’m gonna do a full data parse later today & add it to my thread as well, make it easier for people to use

Good idea I just cant believe theres still no established source yet xd

This data is AMAZINGLY neat, I tried the trick where you check for the [email protected], and there was no results. I also tried to do the trick where you grab all the email domains, and sort by the amount of occurrences, and there are NO emails with “test” in them, or ANY type of spam/testing domains. I’m going to do the grueling task of identifying the few thousands of domains used for emails, and hopefully find some domains that might be the source (assuming people that owned this system tested it with their company emails). They must’ve been whitelisting email domains & triple checking that emails exist before allowing people to continue. I am almost certain that this data is from some type of verifications.io//experian//other data on people.
There are three important question, where did the
 data  come from, what did some fields mean, and what's the meaning of some value like [1,3]..

Already answered this, we would need to find the source of the data to identify what these numbers might be in relation to.
Reply
#11
I did not quite understand from the description what exactly are the differences from the original theme @pompompurin ?
Reply
#12
(April 26, 2021 at 06:11 PM)ForumRAID Wrote: I did not quite understand from the description what exactly are the differences from the original theme @pompompurin ?

It includes less fields, and it’s combined into one file for people who rather have that
Reply

Possibly Related Threads…
Thread Author Replies Views Last Post
CSV 263GB Leak [250,807,711 Total records] pompompurin 104 17,670 10 hours ago
Last Post: pompompurin
Processed MyFitnessPal Database andyleung0927 9 2,109 Yesterday at 02:11 PM
Last Post: anonymanusergirl
USA home owners data - 9,912,795Unique Records andyleung0927 42 12,282 May 06, 2021 at 12:43 AM
Last Post: fizzb

 Users browsing this thread: 1 Guest(s)