Thursday, 6 December 2007

MySQL: Unlimited updates in one query - ish.

One of the major parts of my job is/was developing the interface to our MIS that handles hundreds of connections on a nightly basis from other MIS systems which push data in a CSV format. These CSV files contain information that then needs to be pushed into our MIS (thousands of inserts and updates, plus some deletes).

Now, creating the insert statements are a breeze with the "insert into tbl0(...) values(...),(...),..." MySQL syntax. I can push thousands of inserts in a single query, the only limit being the MySQL packet size, hence the 'ish' in the title. But what about updates? before now I've resorted to just running the updates as single queries, which lets face it, is slow. So lets look at how we can speed this up.

The Meat

First create a test table:

create table test0(sid int, staffid varchar(255), name varchar(255), unique(sid, staffid));

Insert some data:

insert into test0(sid, staffid, name) values(1, 'ar', 'Andrew'), (1, 'jm', 'John'), (1, 'sl', 'Seb');

Now we want to turn:

update test1 set name = 'Andrew 2' where sid = 1 and staffid = 'ar';
update test1 set name = 'John 2' where sid = 1 and staffid = 'jm';
update test1 set name = 'Seb 2' where sid = 1 and staffid = 'sl';

Into a single query, and we do this like so:

insert into test0(sid, staffid, name) values(1, 'ar', 'Andrew 2'),
(1, 'jm', 'John 2'),(1, 'sl', 'Seb 2') on duplicate key update = values(name);

`test0` should now have been updated with the new values. The key to all this is using "on duplicate key update".

I should point out, for the uninitiated that you need a unique key for this to work. In this case it is (sid, staffid) plus there is no need to update the unique keys.

Initial tests show a two thirds increase in speed when we are:
  1. Hitting MySQL from a PHP script.
  2. Breaking the updates into 1024k chunks.
  3. Connecting to MySQL on the localhost
If you are connecting to MySQL over a network you should see even better speed improvements because of the effects of network latency.

Drawbacks and Unknowns
  1. I've not tested this against multiple update queries sent in one go. i.e. "update tbl0 set col1 = 1 where col0 = 0; update tbl0 set col1 = 2 where col0 =1; ...".
  2. Increased memory use from the size of the insert statement. However I took the chunks down to 500k and even 250k and lost no more than 5-10% performance.
  3. If your using PHP watch the memory limit. Also keep your chunks below the max allowed packet size.

Friday, 30 March 2007

PHP: Handling MySQL replication failure gracefully - Part 2

The first version of this code failed to work on 5.1 so here is a version that does:
$result = mysql_query("show slave status;");
$row = mysql_fetch_assoc($result);
if( ($row['Slave_IO_Running']!='Yes') || ($row['Slave_SQL_Running']!='Yes') )
$fp = fopen('tmp/halt', 'w+');
header('Location: index.php');

Remember to add something like:
if(file_exists('tmp/halt')) exit('Unable to complete request at this time');

at the start of your app execution so as to stop users entering data that can cause all kinds of living nightmares when it comes to patching the data together.

Wednesday, 28 March 2007

Pivoting Data in MySQL

This is a good read for anyone who has ever tried to pivot data from a table in MySQL using client side code like PHP when you can with some careful query construction do it in MySQL.

Friday, 9 March 2007

All in the "swizzle"

Just incase I lose it.. and as some proof I was on the track with it.. this is the foundation of a new tree theory I am working on.

char *rotKeys(char *key)
int x,y;
char *ret = (char *)calloc(8, 1);

return ret;

Wednesday, 24 January 2007

MyStatsQL - Screen shot of data in graphical format

For anyone interested in what can be done with the stats gathered with MyStatsQL, here is a screen shot. Eventually I will put a package together for download. The different color lines represent different servers, in this case two.

Tuesday, 23 January 2007

MyStatsQL - Easy temporal stats gathering of MySQL server(s)

To make life a little easier at home and work I've put together a simple enough script that is run by cron to tally up statistics like: Innodb_data_reads, Innodb_data_writes, Qcache_hits, Qcache_inserts, Questions, Open_files and Open_tables over time (I sample every hour) with the ability to easily expand and add your own `show status` variables.

Next I'll create some nice graphical interface so it's more obvious how the server or cluster is running. You can grab the script over at my CV hosting. Note that actual client writes in a master <-> master environment would be (writes / number of servers) roughly, very roughly. If you have any ideas for countering this please let me know.


Monday, 22 January 2007

Handling MySQL replication failure gracefully

As any serious MySQL DBA knows, MySQL can and will now and again go out of sync, most of the time it cures itself, but sometimes the worst happens and a slave stops replicating. Yet I bet most apps keep on going with random outcomes, especially if your storing the users sessions in MySQL and have a master<->master setup.

To counter this problem I implemented a quick an easy check. If the check catches a replication failure `touch` a file to the nfs mounted tmp directory that is shared by all nodes.

Each node before executing anything (in index.php) checks for a file called 'halt' in tmp. If found exits and bingo no weirdness, corruption, or wondering why something is there one click, gone the next. It even emails you, and with a simple addition can sms.

mysql_query("show slave status;", CONNECTION);
mail('team@<some url>',
"Error show Slave Status: ".
$fp = fopen('tmp/halt', 'w+');
header('Location: index.php');

Don't forget to remove the 'halt' file after you have the replication under control.

Tuesday, 16 January 2007

Bitmaps | Bit Arrays - Part2

I was going to write an article on wikipedia about bitmaps. But came across this article on Bit Arrays.. ala Bitmaps after a little searching beyond raster graphics. Gotta love that wikipedia.

It's well worth the time learning how it all works and I've created the toggle macro for you:

#define bitmap_toggle(bitmap, i) bitmap[i/8]~=(1<<(i%8))

Monday, 8 January 2007


I've found a great free pdf book all about semephores, it's 261 pages long and called "The Little Book of Semaphores" :D

Bitmaps - Part 1

I've heard of bitmaps but never seen an implementation. I guess I've not browsed enough code. Anyway I was thinking about it whilst watching "You, Me and Dupree", and thinking about it again as I ran to catch quiz night at the woody, and some more as we tried to work out what a Raider bar(twix) was called in Europe.

Anyway I just got home and implemented some nice and simple bitmap code. It should come in handy with keeping track of paging in the DB project. Enjoy. Oh and excuse the sloppy code, it's a first attempt and all.

#define bitmap_init(size) calloc(1, sizeof(char)*((size/8)+1))
#define bitmap_set(bitmap, i) bitmap[i/8]|=(1<<(i%8))
#define bitmap_unset(bitmap, i) bitmap[i/8]&=~(1<<(i%8))
#define bitmap_isset(bitmap, i)!!(bitmap[i/8]&(1<<(i%8)))

Thursday, 4 January 2007

Manifestation type

You may or may not have heard of it, SQLite has them, but what is a manifestation type (I'll call them mtypes from now on)?

You can think of an mtype as a untyped variable with a twist. That is you can assign anything you want to it without having to first declare its type OR size. But first lets look at a typed variable example using the C programming language. In C if you want to store a string you will need to declare a fixed char array or create a char pointer to some malloc'd memory cast as char:

// fixed length array
char str[28];


// malloc'd fixed length array
char *str;
str = (char)malloc(sizeof(char)*28);

You need to know ahead of time how much space you will need to store a string. If your string was longer than 28 characters you would lose everything after character 28, safely off course by being sure not to write more than 28 bytes ;) This is very much the same principle in the database world, you must know the type and size before hand of a column.

Now lets use PHPs untyped variables to show an example of a more mtype way of doing things:

$str = "Hello World!\n"


$str .= "from Andrew Roses blog!\n";

The first line of code created the variable without being told what type it is or how big it is or is going to be and stored the string, the second line adds more to it (showing how there's no need to specify size). Now, watch this!

$str = 42;

Try doing something like that in C ;) What we have done it assigned what was once a string an integer.

Lets move to the database world now with the examples before in mind. First lets create a table:

create table test (
id int,
name varchar(32)

Now if we insert some data which sticks to the types(I'm using set in insert so things are clearer):

insert into test set id = 42, name = "Andrew Rose";*

This works, because we stuck to the types, but if we try this instead:

insert into test set id = "Andrew Rose", name = 42;*

If your DBMS does not support mtypes id will become 0 and name will be "42", but not the number 42, but the string. Most DBMSs will convert numeric to string and vica-versa. id became 0 because it has no logical numeric value.

A DBMS that supports mtypes would have ended up putting "Andrew Rose" in id and 42 most likely as a string in id, but possibly as an integer.

In fact in SQLite the table we created can be created without declaring the types at all:

create table test(

So are mtypes good or evil? That all depends on the user I believe.

*SQLite does not support the set in insert syntax.