How To REALLY Survive Digg on a Shared Host

After reading a ridiculous post on “surviving the Digg effect on a shared host,” (and then laughing ridiculously at it), I decided to write a real tutorial on real-live ways not only to survive the Digg effect, but also a simple but powerful way to improve your site’s performance. Read more within.


The fact is, it’s rarely PHP that crashes a website. It’s usually overloading MySQL and/or bandwidth limitations. Obviously, we can’t do much about bandwidth. If you’re on a really cool host, they may be psyched for your Digging or Slashdotting, but most are not particularly keen on you using too much processor, too much bandwidth, or too many MySQL connections. Although MySQL is capable of amazing scaling, rarely do web hosts leverage it properly, and even if they did, rarely do most programmers know how to read into slow queries, find inefficient code like loops within loops, and debug a site performance before it’s too late.

The site linked from Digg shares a very lame trick, let’s examine:

//put this code at the very beginning of your page, before anything else
$randomdigg = rand(1,4);
if($randomdigg != 1) {
exit("Thanks for visiting, but the site is under a great deal of 
stress due to being on the front page of Digg. 
Please try back in a few moments. Thanks!");
}

Ok, so essentially, approximately 25% of the time, the page will work, the other 75% of the time, the user will get the message and be asked to refresh later. That’s really lame. First off, unless you want your website to be functional only 25% of the time, you have to be there babysitting it when it hits front page. Also, you need to remove this code. But what if you wanted everything to happen automagically? What if your site could withstand a digging, a slashdotting, a redditing, an OSNews’ing, a Farking, whatever…. all at once, without being forewarned?

Although there are lots of [[http://www.danga.com/memcached/|tools that can do magic for you]], you can leverage plain old PHP to do this dirty work for you, and it’s easy to integrate into your site.

Here’s the premise, we examine the referrer of the page, and if the page is from one or many domains that you specify, you build a cache ON REQUEST. Then, you serve the cached version. When the cache stales, say after minute or two, you refresh it, as needed, again, on request. And the best part is, it’s easy to do.

So, let’s review the code. First, at the top of the page, let’s do this:

$rfr = $_SERVER['HTTP_REFERER']; //get the referring site
$op = '';
if(strstr($rfr,'digg.com')) { $op="do-cache"; } 
$cache= "/path/to/mysite/cache/".$postid.".html";

Let’s take it line by line. First, we get the referring site. If it is digg.com, we’re going to turn the cache on by setting $op to “do-cache.” Note that we preset the $op variable to null to prevent abuse. Next, we define the file we’re going to use as the cache. Since we might want to do this for any entry/page, we’ll give it a unique address. Remember that you do not have to use .html as an extension, since it’s static code anyway. I’ve seen “id.cache”, “id.txt”, and even just “id”. It doesn’t matter, it will be included in a PHP page and spit out as X/HTML anyway.

if(file_exists($cache)) {
	$fmt = filemtime($cache); 
	if($fmt > time()-180) { 
		header("Location: http://mysite.com/cache/".$id.".html"); 
		exit(); 
	} elseif($fmt > time()-1800) { 
         //cache will reload if it's been reloaded 
         //in the last half hour
	   $op=="do-cache"; 
	} else {
	   $op='';
	}
}

This might seem overwhelming, but it’s really not. Check it out. If the cache exists, regardless of referrer, we’re going to check it. The filemtime() function will tell us when the cache was last refreshed. If it reports a time in the last 180 seconds (you can change that number), it will redirect you to the cache. PHP will barely have to do any work, Apache will barely have to do any work, and MySQL will not be touched at all. Almost any host can hang in there to serve static traffic like this.

But what about if the cache is old? Once your site is no longer getting hits every three minutes, for the next 27 minutes, it will blindly reload the cache. It will keep the cache going as long as the page has been hit in the last 30 minutes. After no one hits the cache for 30 minutes, it will go back to fully dynamic. When someone hits the page from digg again, it will start the caching process again. In this case, we’re going to let the cache rebuild. Enter the next part of the code.

if($op=="do-cache") {
	ob_clean(); 
	ob_start(); 
}

If the $op variable is set to “do-cache”, our mission is… uh… to build or rebuild the cache. So, we begin with two simple PHP standard functions, ob_clean() and ob_start(). The “ob” in these functions stands for “output buffering.” When you buffer your output, nothing is echoed by the script, but rather written to a buffer. In short, it’s all stored up in memory for eventual dumping later. Now we stop the caching scripts and just write our normal HTML via PHP and MySQL.

At the end of the page – the very end, after your html element is closed, you’ll need this last bit.

if($op=="do-cache") { 
	$buffer = ob_get_contents(); 
	$fp = @fopen($cache,'w'); 
	if($fp) { 
		fwrite($fp,$buffer); 
	}
	@fclose($fp); 
	ob_get_flush();  	
}

What’s up with this? Again, if we’re building the cache, we’re getting the contents of the buffer, which contains all of our page. Then we’re opening the cache file we specified above, and lastly, dumping the contents of the buffer into the cache file. Our final action is to clear the buffer, which is unnecessary to us now anyway.

That’s it. Your cache is reloaded, and will be served for the next three minutes, after which it will reload the cache every 3 minutes until no one hits the site for 30 minutes. Then it will go back to dynamic. If another digg visitor comes, it will recache and serve the cache. Having written several types of caching mechanisms for large sites myself, I can tell you this “cache on demand” method works. On OSNews v4, we use this for things like user-specific RSS files, which aren’t plausibly built via cron jobs and, frankly, we don’t want to allow access to MySQL without limit. They reload every 60 minutes, regardless of referer.

If you maintain a site and have any concern about being Dugg, I encourage you to research some sort of caching mechanism. Your web host will thank you, and you will be thankful for the maintained uptime and high hit count.

2 Replies to “How To REALLY Survive Digg on a Shared Host”

  1. Hi,

    You have a very cool blog here! loved the content.
    You know there is an awesome concept I have come across through which you can access Linkedin on your mobile via sms & email without internet, GPRS or EDGE. Linkedin can be accessed anywhere anytime on almost any phone.

    Cheers

Comments are closed.