Improving search speed
by S0P - December 06, 2019 at 09:43 PM
#1
Yo,

I've been working for some time to optimize my databases in a way that allows them to be more "searchable". Initially, I just grepped the directory of databases, but that turned into a huge time wasting task really fast, and it wasn't long before a search could run for up to 3 minutes. So I indexed the data a little.

I wrote a few scripts to dump all of my data into a single combolist, creating a small index file of ~500m lines containing lines that look like this:

{"uname":"[email protected]","password":"password","file":"database.txt"}

So I wrote a program in Java to search over this file, and I've tried nearly everything, and the best I can get my search down to is roughly 45 seconds. It seems I've reached the peak of the speed of my disk, lmao. 

My goal now is to index the data in more recognizable unique chunks with patterns such as email provider, but in doing this, I realized there were more than 20k email providers in this list (mostly consisting of things like [email protected]) and the index would be ridiculously large and might even cause performance to suffer.

What steps would you take next in this scenario? How would you move forward in creating an index of this data with the goal of at least reducing search time by 50%? I'd love to hear your ideas! Smile
#2
Binary Tree algorithm, my speed was 1-2second for each address.Data was 600gb ..
#3
(December 06, 2019 at 09:54 PM)FluffyBunnyFufu Wrote: Binary Tree algorithm, my speed was 1-2second for each address.Data was 600gb ..

Hey, thanks for the reply! I did some looking into this, and it might be what I'm looking for.

I was reading through https://www.cs.cmu.edu/~adamchik/15-121/...trees.html, specifically at Binary Search Trees, and I'm actually a little confused tho.

I can create something like this that can be held in memory, with a small amount of data, and just juggle through objects to get the desired result. However, with data this large it has to be stored on the filesystem, meaning I still need a way to "key" my data before the structure is created, don't I?
#4
(December 06, 2019 at 10:05 PM)S0P Wrote:
(December 06, 2019 at 09:54 PM)FluffyBunnyFufu Wrote: Binary Tree algorithm, my speed was 1-2second for each address.Data was 600gb ..

Hey, thanks for the reply! I did some looking into this, and it might be what I'm looking for.

I was reading through https://www.cs.cmu.edu/~adamchik/15-121/...trees.html, specifically at Binary Search Trees, and I'm actually a little confused tho.

I can create something like this that can be held in memory, with a small amount of data, and just juggle through objects to get the desired result. However, with data this large it has to be stored on the filesystem, meaning I still need a way to "key" my data before the structure is created, don't I?

Check 41gb database breach(that 1.1billion thing i guess pretty famous), there was script.
#5
(December 06, 2019 at 09:54 PM)FluffyBunnyFufu Wrote: Binary Tree algorithm, my speed was 1-2second for each address.Data was 600gb ..

I was curious too, thank you for the useful reply
#6
Help this man out
#7
It is mental how fast some db searching sites online search. Billions of records in less than a second, I would say ask them but I don't think they would want to share!
#8
As of yesterday, I found my answer.

A simple index, and an efficient means of indexing. My only drawback was indexing speed with big fancy indexes or algorithms that required a huge level of restructuring, so I tested around, and found that the most efficient way to restructure the data would be to split it up into a larger file structure.

I decided to substring index emails and usernames as equals, which may taint the validity of my data, but it enabled me to search through it hundreds of times faster.

Now instead of searching through dozens of dbs or combolists, or one single large combolist, records of an email "[email protected]" can be found in a relatively small combolist at /root/p/w/n/data.txt

It might have honestly been that I was just overthinking it.
#9
(December 10, 2019 at 02:30 AM)S0P Wrote: As of yesterday, I found my answer.

A simple index, and an efficient means of indexing. My only drawback was indexing speed with big fancy indexes or algorithms that required a huge level of restructuring, so I tested around, and found that the most efficient way to restructure the data would be to split it up into a larger file structure.

I decided to substring index emails and usernames as equals, which may taint the validity of my data, but it enabled me to search through it hundreds of times faster.

Now instead of searching through dozens of dbs or combolists, or one single large combolist, records of an email "[email protected]" can be found in a relatively small combolist at /root/p/w/n/data.txt

It might have honestly been that I was just overthinking it.

Need double index, 1361 folders if im not wrong.Pm me can discuss about it.
#10
(December 10, 2019 at 02:30 AM)S0P Wrote: As of yesterday, I found my answer.

A simple index, and an efficient means of indexing. My only drawback was indexing speed with big fancy indexes or algorithms that required a huge level of restructuring, so I tested around, and found that the most efficient way to restructure the data would be to split it up into a larger file structure.

I decided to substring index emails and usernames as equals, which may taint the validity of my data, but it enabled me to search through it hundreds of times faster.

Now instead of searching through dozens of dbs or combolists, or one single large combolist, records of an email "[email protected]" can be found in a relatively small combolist at /root/p/w/n/data.txt

It might have honestly been that I was just overthinking it.

thats great mate you got that
#11
:P combo good bunny trhank
#12
Thanks, ive been looking for solutions and ive found a few. i will try to implement these and ill reply with my results Smile

Possibly Related Threads…
Thread Author Replies Views Last Post
Deep Web Search Engine Links - .Onion Domains (2020) novelaspect 57 2,024 September 28, 2020 at 04:35 PM
Last Post: n4styn4s
which search engine you guys use ? xD1ous77 71 3,452 February 02, 2020 at 06:03 AM
Last Post: Zax
Where can one search through all the passwords? lambasoft 10 586 September 21, 2019 at 10:00 PM
Last Post: kepplstu

 Users browsing this thread: 1 Guest(s)