Jump to content

Writing DB Contents to a File, then retrieving information via PHP.


Jumpy09

Recommended Posts

I'm going to be blunt, I've been working on a website for quite some time now and I never really stopped to think about the effects that an over abundant amount of queries could actually slow things down.  I'm quite new to the whole PHP / MySQL thing so I am still learning.

 

While reading through some things about optimization of queries I've learned some valuable information, such as working with Indexes, and not running unnecessary queries which the data doesn't seem to change that often.

 

So for a few aspects of my site which don't really change in particular periods of time, I've decided to write the contents of them into a text file then retrieve and display them as normal through my site.  Things that last an hour, or a day, will no longer be queries on each page load; tremendously reducing the amount of queries which my database would take.

 

Currently I have the following code:

if($database->numrows($query) > 0) {
        $myFile = 'testFile.txt';

    if($fh = fopen($myFile, 'w')) {
        $i = 0;
        while($row = $database->fetchassoc($query)) {
            ++$i;
            $stringData = '' . $row['id'] . '--' . $row['name'] . '--' . $row['color'] . '--' . $row['bdaytest'] . '--' . $row['gmtoffset'] . "\n";
            fwrite($fh, $stringData);
        
        }
    fclose($fh);
    }
}

 

Which Prints Out

 

32--User3--0--1939-10-03--+0.0

27--User7--0--1979-10-04--+17.0

28--User65--150--1969-10-05--+17.0

30--User1--0--1959-10-06--+6.0

31--User9--0--1949-10-07--+3.0

29--User5--0--1989-10-08--+17.0

 

The problem stems from pulling the information from the file and exploding it.  That file would be for yesterday's up to 7 days from now's birthdays.

 

The amount of lines could change, possibly messing up the possibility of an array.

 

I was thinking about pulling the information into an array, than exploding it from within that array, posting it to variables.. and then have it display all in a while.

 

Am I on the right track, or is there some other alternative I can use?  Only data that will always be visible on the website, will be held in these text files.

 

Any suggestions on how I pull the information from the text file and explode it to display exactly how I need it to be displayed?

Link to comment
Share on other sites

The amount of file processing your likely to need to do this is very likely to out way the processing of simply querying a database.

 

Well the thing with birthdays is it would have to scan each individual row to determine whether or not that birthday falls in the proper dates.  This could be greatly exaggerated when facing 100,000 Users loading the page a few hundred times per day.

 

So 100,000 People x 100 Page Views = 10,000,000 Page Views x 100,000 Rows = 1,000,000,000,000 Rows Processed per day.

Compared to 100,000 People x 100 Page Views = 10,000,000 testFile.txt Processing + 100,000 Rows Processed during the update process.

 

The same thing could be said for the other queries I am attempting to turn into this method, because they pull information between periods of time.

 

I'm no expert, but I do believe that 1,000,000,000,000 is greatly larger to process than 10,100,000.

 

If I was only going to have maybe 400 users, I generally wouldn't care.  But if my website does what I am planning on it doing, then I'd best prepare to optimize while I am ahead of the game... right?

Link to comment
Share on other sites

if the birthdate field is properly indexed, the query would not process anything close to 100000 rows for each query.

 

on the other hand, i agree why do the same thing repeatedly if unnecessary. I would have a script that runs once a day to load a table with the statistics. i would still avoid a flat file.

Link to comment
Share on other sites

Whether or not you really need to cache the data is highly related to the size of the database and to the amount of memory and performance tuning that was done.  Most relational databases support built in caching, so that if a query is run repeatedly and there's enough database cache allocated, that result may be coming from memory a large amount of the time.  With that said there are solutions for doing memory based caching.  In the case of things like summaries or other data that remains static for substantial periods of time, either by nature of the application and workload or by design, there are memory based cache solutions.

 

Would you like to use what sites like facebook use for database result caching?

 

http://us.php.net/memcache

 

Memcached is used by a lot of really large sites because it's a distributed cache, so you can have a whole array of machines and create really large caches.

 

For single server installations it's probably easier to use this, which works in a similar manner:

 

http://us.php.net/manual/en/book.apc.php

 

 

 

 

 

Link to comment
Share on other sites

To BlueSkyIS:  Indexing could work, but I am attempting to reduce unneeded queries which just tie up the system and the flow.  It may not be a problem as the website starts out, but a few million less queries later on could help performance.

 

I had thought of using the indexing on Birthdates, but a few other things I have on my site only need to be updated once per hour, or once every two minutes.. which would allow semi-static dynamic content.  Like if you randomly get a quote from a database and want to display that particular quote for 2 minutes instead of everyone pulling random quotes on page view.  The Flat File when used in small settings allows that to be possible.

 

I don't plan on storing an entire database in a flat file, just small bits of information which are easily replaced once the next CRON Job goes off.

 


 

To Gizmola:  Caching the Database makes perfect since, which as I am using a local install I don't particularly need to do so as the hits are relatively small.  I will perform tests when I reach a Beta Level which will allow me to see the results to find and optimize all queries even further.  Currently I am indexing important fields, and on a publication site I will test cache and no-cache to see the performance difference.

 

Unfortunately anything with MySQL Functions like RAND() or CURDATE() or NOW(), which the Birthday Query itself uses CURDATE(), cannot be cached due to the functions never being the same for future queries.  Which would render Caching null for this particular query, plus any of the other queries I have been planning on outputting into a FLAT FILE.

 


 

I understand the concerns with using a FLAT File for storing information, but all the information that would be stored in these flat files are going to be publicly displayed across the website.  Which doesn't concern me if the files are found or read.  Three files would only consist of a single line, and each time the file is written to it would be overwritten instead of new information being placed under it.  To further reduce the risk of Data Corruption, only CRON Jobs would have access to run the file which writes to the Flat Files.  I plan on using .htaccess to restrict any access to CRON Job files.

 

Optimization comes in many ways, cache can dramatically increase performance as well as indexing can.  These are great methods which I have already planned on using for other information which could be static for a while, or dynamic at any moment.

 

The reason I am choosing to use the Flat Files is to reduce the overhead amount of queries.

 

Example Statistics:

1,000,000 Page Views

10,000 Users

File1: Information only changing once per hour.

File2: Information only changing once every two minutes.

File3: Information only changing once per day.

 

Each File Uncached: 1,000,000 Queries per day.

Each File Cached: 1,000,000 Queries per day.

 

File 1 Flat File: 24 Queries Per Day.

File 2 Flat File: 720 Queries Per Day.

File 3 Flat File: 1 Query Per Day.

 

1,000,000 Page Views isn't exactly an issue, but with one website I know of they had 127M Database Queries, and 3 Billion queries from cache.  Which is very impressive the difference from real hits to cache hits are tremendous, but I've had problems with cached information being brought up after an update.  I'm not sure how they work with cache hits, but it's a problem that's been reported on the site frequently.

 

Now for 1,000,000 queries isn't much of a problem, but if you have 35 queries per page.  That's 35,000,000 queries per day.  So even with indexing and caching I can still kill 4,000,000 queries that can't be cached anyway.

 


 

I did figure out how to pull the information from the file by the way, and I suppose unless I am given a concrete reason why Flat File usage isn't a great idea I will proceed with my original plan.

 

So far all I have not really heard, nor been able to find through Google, a specific reason why the use of Flat Files are ill-advised.  From what I have read through my searches, using Flat Files for entire Databases is a bad idea due to Data Corruption and such.  But according to various sources, the use of a Flat File for tiny bits of information which isn't constantly being wrote to is relatively alright.

 

I'm not trying to argue or anything, I just have not been given a valid reason why not to use it.

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.