Scaling write intensive MySQL DBs

Posted by James Sun, 13 Apr 2008 23:40:00 GMT

Recently I’ve been doing work on scaling some of our medium sized web applications. In our case, the bottleneck is in the data layer, which is often the case for most web applications. And of course, scaling the data layer is always the most challenging.

Most people, instinctively think memcached when you tell them to scale the data layer. While memcache is great, it’s not the big hammer that everyone think it is. In most cases, memcache can speedup reads significantly, especially for data that is accessed repeatedly, however, it does very little to speed up writes, with few exceptions (as I will illustrate shortly).

The MySQL database in question, handles approximately 97% writes and only 3% reads. There are a few commonly known techniques for scaling this type of database.

  • Group the writes in batches, thus doing fewer writes.
  • Use insert delayed or update low_priority
  • Insert into memory table and later batch insert it into the persistent table.
  • Use master-slave setup and write to master and read from slave.
  • Write a customized data layer to handle the writes asynchronously.

Obviously, these increases in difficulty as you go down the list. For our application, most of the writes can’t be batched. So we had to start with technique #2.

We really don’t worry too much about data consistency and transactions, because of the nature of the application, so insert delayed works just fine. The problem is that insert delayed and update low_priority queries only work with myisam and memory tables in MySQL. Rails by default uses innodb tables, so the tables in question must be converted. Depending on the size of the database, this could mean some downtime for the app. Additionally, myisam uses table-level locking, so we have to be really careful in how we use the database. This is where memcache comes in handly. I was able to cache the results of some commonly executed queries so they no longer hit the database, and the performance improved significantly. Using technique #2 along with memcache sped up our application 2 to 3 times.

The actual implementation was pretty simple. I wrote a Rails plugin to add a method save_delayed to ActiveRecord::Base. The are two problems with this. First the timestamp fields (created_at and updated_at) no longer auto populate. So, I added methods in the plugin to fill them out if they exist. The second problem is more serious. The records no longer automatically get assigned a primary key id when they’re saved. Because the save happens asynchronously, the id will always be 0. There’s no way around this except to not use the record after it’s saved. And if you need to access it in anyway, you should do a find and construct a new record object instead. This is okay for us, because we do a fire-and-forget save operation anyways.

The drawback to using myisam is that at some point, the table locking just won’t scale anymore. Innodb is a much better storage engine for larger datasets and we’ll have to move to technique #3. We can use insert delayed into a memory table, then have a process that moves data from the memory table to a persistent innodb table. This reinsertion process will have to be very well implemented because you don’t want to do deletes and inserts and start inserting multiple times or losing rows. I haven’t tried this, but will soon.

Scaling data layer is one of the most challenging part of my job. I love to see the application fly after making a few changes. This is partly why my job rocks!

Posted in  | 1 comment

Comments

  1. markmotero@gmail.com said about 1 month later:

    We have in some cases the same problem, where we do massive amounts of writes vs. reads. I read your blog through and forwarded it to my partner to read over. Thanks for sharing.

    • Mark @ KlickNation

(leave url/email »)

   Comment Markup Help Preview comment