PeopleDataLabs (PDL) Database - Leaked, Download!
by Maduka - October 28, 2020 at 07:53 PM
Any one has a python script or whatever to convert json to csv of this file?
Nothing close to 400 million email addresses

[email protected]:/home/adamklemm/password/PeopleDataLabs_RF# grep -E -o "\b[A-Za-z0-9._%+-][email protected][A-Za-z0-9.-]+\.[A-Za-z]{2,6}\b" PeopleDataLabs_416M.json > PeopleEmail.txt
[email protected]:/home/adamklemm/password/PeopleDataLabs_RF# wc -l PeopleEmail.txt
180533015 PeopleEmail.txt
but it seems that data was cleaned before, for example in this article we cannot have all these informations described.

But well, it seems legit
Hi, could you help me out with PDL live link? many thanks

(October 29, 2020 at 06:47 AM)akmenon Wrote: Great data, thanks bro!
(November 09, 2020 at 01:41 AM)19689p Wrote: Any one has a python script or whatever to convert  json to csv  of this file?

I'm struggling also

I took 10 lines and then tried to use jq

cat pdltest.json | jq -r '(map(keys) | add | unique) as $cols | map(. as $row | $cols | map($row[.])) as $rows | $cols, $rows[] | @csv' > pdltest.csv


cat pdltest.json | jq '.pdltest | keys_unsorted'


jq -r 'map({a,e,liid,linkedin,n}) | (first | keys_unsorted) as $keys | map([to_entries[] | .value]) as $rows | $keys,$rows[] | @csv' pdltest.json > pdltest.csv

But always errors - ie on the last one:

jq: error (at pdltest.json:1): Cannot index string with string "a"

Just want to export this to CSV and into MySQL

(January 08, 2021 at 12:06 AM)groh Wrote: ...but it seems that data was cleaned before, for example in this article we cannot have all these informations described...

Looking at the article you cited it states:

According to their website, the PDL application can be used to search:

 - Over 1.5 Billion unique people, including close to 260 million in the US.
 - Over 1 billion personal email addresses. Work email for 70%+ decision makers in the US, UK, and Canada.
 - Over 420 million Linkedin urls
 - Over 1 billion facebook urls and ids.
 - 400 million+ phone numbers. 200 million+ US-based valid cell phone numbers.

OK - this data contains 416 million lines but with only Address, Email (optional), LinkedIn ID, LinkedIn URL and Name

That suggests it is what they describe above as "Over 420 million Linkedin urls"

Still very useful and highly appreciated thanks - but would love more of the above...

