general question - search/subsearch/temporary tables

turpentyne · September 5, 2010

Bear with me, as I'm still quite new at PHP.

I'm trying to figure out the best way to build a search function for a database that will eventually be quite large.

The path of the search I'm building is to go from a straight-forward javascript and html search page, to a paginated search result. Then the user can narrow down those query results based on a new search of their results.

In trying to get the pagination to work across several pages(unsuccessfuly, I might add), I've been learning about sessions and temporary tables. But, because the database will be very large, I'm wondering if they are the wrong way. Sessions have a time limit, and temporary tables delete once the connection is closed.

Is it possible/feasible to build a permanent table that puts in the search variables, with a unique id and use this to manage the searches, pagination and all that? Then, at some point in the process, I can put code to delete the corresponding row (or even make it a saveable search). Maybe somebody sees where I'm going with this and can describe it better than me? I'm just thinking off the cuff at the moment. Maybe there's some terminology that will help me find a tutorial. Any bit helps.

thanks!

wolf_dude · September 5, 2010

How complex will you allow these searches to be? Could they potentially involve JOINs, GROUP BYs, and/or HAVING clauses? If so, then you really might need to create an actual persistent table. I had to do this for a project once. I don't know if our solution was really the best, but it worked for our needs, so I'll share it with you.

Each search result was dumped into a persistent table. The searcher could re-order their results, page through them, or apply additional filters. All this was done through basic 'WHERE', 'ORDER', and 'LIMIT' clauses on the search result table. We used cron jobs to delete the tables after they expired. To track that, we had an index table for the searches that included all the search parameters, a randomly-generated name for the table (preceded by two underscores, so a human could easily tell what those tables were) and an expiration time - typically about 15 minutes from the time of the search, but the expire time kept getting reset whenever a searcher paged through their results. We had a cron job that would go through the records in that table to determine which ones had expired and would then drop them.

However, if they had just been searching a single table - even a large one, so long as all their results were on a single shard (will you table be sharded?), we would have just used indexing on the searchable fields and let MySQL's built-in optimizations handle generating the results each time the user paged through them.

RussellReal · September 5, 2010

using a perm table for results than dropping results and adding results will add alot of overhead to that table, and will be alot of heavy operations.. I really doubt google uses mysql for their search engine..

Their search engine is really robust, and mysql has its limitations..temporary tables will speed up your results ten fold..

I was working on this project with this pretty cool group of people.. and they had like over 40million results in their database and they wanted to be able to index all of these in a seach and also add commenting functionality, and no matter how bad we tried to get the comments and the like working across 40 million results with just normal queries.. the queries were all over 1 second large with comment calculations and stuff, not to mention scanning 3 tables for data.. but then incorporating all of this information and cutting away results from the temp table.. instead of filtering directly from the result set (you know there were basic filters) but most of the filtering happened in the temp table.. and it sped it up dramatically to under 150ms/query on a good amount of server load..

temp tables are great because they are exclusively for that session with mysql, and aren't stored in the same way as permanent tables are.. so they're alot faster to work with.. also you can index them specifically for your application instead of working with indexs that are specific for the project as a whole..

I would look into temporary tables for sure

turpentyne · September 6, 2010

I know that it'll be a database that eventually has 300,000 entries, with different categories of information over 5 different tables (not sharded, just some keys) There will definitely be joins. In fact the very first search I've been working is an inner join. When searches are done, they will be narrowing down by numerous variables. It sounds like the website wolf_dude is pretty much what I was thinking.

But if temporary tables can be held onto when somebody starts filtering down their search results, maybe that's the way to go. I certainly want the searches to go as fast as possible.

Sign In

general question - search/subsearch/temporary tables

Recommended Posts

turpentyne

Link to comment

Share on other sites

wolf_dude

Link to comment

Share on other sites

RussellReal

Link to comment

Share on other sites

turpentyne

Link to comment

Share on other sites

Join the conversation

Browse

Activity

Important Information