07 July 2010

Pivot tables in MySQL

Прочети тази статия на български

In this article:

 

Introduction

MySQL and pivot? Try and google it. Or you already did?

I really never had the need for pivoting data, until yesterday. Then I started searching, but I didn't find a solution to enclose into a predefined function which may be reused.

 

The problem (example)

I'm not very good at explaining (if you bothered to read other my posts you must have already noticed that), so maybe it'll be better if I start with a simple example of what we'll work with and what's the result we need. Take a look at the following table definition:

CREATE TABLE `Contracts` (
    ...
    `Agent` VARCHAR(64),
    `Country` VARCHAR(3)
);

Amongh the other data, each row holds the name of the agent responsible for the contract, as well the country in which is the contract partner. For our example we'll need only those two columns. So, the data we're interested in looks something like that:

`Agent``Country`Other columns...
'Peter''UK'...
'Hans''UK'...
'Ivan''GER'...
'Ivan''GER'...
(...)(...)(...)

OK, this is far from what was the data I was working with, I admit it. Also I fully realize that nobody ever, ever will have to work with data so simplified, but it's perfect to illustrate the idea for now.

So, what we need is a table with agents as columns, countries as rows, and number of contracts (agent <-> country) into cells, something like this:

`Country``Peter``Hans``Ivan`Other agents...
'UK'110...
'GER'002...
(...)(...)(...)(...)(...)

The next chapter introduces a simple solution for getting results in that form.

 

Simple solution

The following is a representation of the solution I found in various variants on the net. First, we need a list of all countries:

SELECT DISTINCT `Country` FROM `Contracts`;

Second, we need to construct a query which selects rows from `Contracts` and groups them by agent:

SELECT `Agent` FROM `Contracts` GROUP BY `Agent`;

Last we have to add the columns, each of them representing a country and displaying the sum of contracts with the agent in row. That would be done using IF() and SUM():

,SUM( IF( `Country`='UK', 1, 0 ) ) as `UK`
,SUM( IF( `Country`='GER', 1, 0 ) ) as `GER`

Then the whole select will look like this:

SELECT
  `Agent`
  ,SUM( IF( `Country`='UK', 1, 0 ) ) as `UK`
  ,SUM( IF( `Country`='GER', 1, 0 ) ) as `GER`
FROM `Contracts`
GROUP BY `Agent`;

How does that SUM() and IF() combination work? For each row (each agent), it counts the number of rows in `Contracts` with that agent, and for each row that have `Country`='UK' (when we're talking about the `UK` column) 1 is added to the cell's total.

Basically, this is the whole idea: Get the columns' names, add a SUM() column for each distinct name, and include those SUM()s into a select that groups rows by another (not calculated) column. One problem that arises at once - you can't reuse this select if you'd expect adding or removing countries in future.

 

Automating the process

So, what can be done to not have to write a select anytime the pivot table's needed? So far the only way I could think of was using two subsequent queries, the first one needed to determine the columns, and the second to get the pivot result. The second query would be generated automatically (with the first query's result included). This might be done entirely from the MySQL engine, or by using an external language.

 

Generating the SUM() columns

One way to somewhat ease the process (and a step ahead into automating the pivot query) would be to make MySQL to generate the SUM() columns:

SELECT CONCAT(',SUM( IF( `Country`=\'',`Country`,'\', 1, 0 ) ) AS `',`Country`,'`')
AS `Generated_Column`
FROM `Contracts`
GROUP BY `Country`;
This produces the following output:
`Generated_Column`
,SUM( IF( `Country`='GER', 1, 0 ) ) as `GER`
,SUM( IF( `Country`='UK', 1, 0 ) ) as `GER`

Once stored, this select statement can be used everytime when the pivot select is being written. However, later we'll need the result stored into a variable which is impossible unless the result is a single row. The GROUP_CONCAT() aggregate function is what we need:

SELECT GROUP_CONCAT(DISTINCT `Country` SEPARATOR ', ')
INTO @a FROM `Contracts`;
This will result in 'UK, GER', stored into @a.

Now we have to add the rest of the SQL text around `Country`, but here's a problem: in our SUM() definition `Country` is used twice. If we add `Country` second time in GROUP_CONCAT, we cannot use DISTINCT more than once (produces SQL error), but if we don't, we'll get concatenation for each row. Instead we can use a second variable to store `Country`'s value and then plate the variable where we need that value the second time:

SELECT GROUP_CONCAT(DISTINCT @b=`Country`, ' ', @b SEPARATOR ', ')
INTO @a FROM `Contracts`;
Now @a = 'UK UK, GER GER'. Let's add the rest of the text:
SELECT GROUP_CONCAT(
    DISTINCT 'SUM( IF( `Country`=\'', @b = `Country`,
    '\', 1, 0 ) ) AS `', @b, '`' SEPARATOR ', ')
INTO @a from `Contracts`;
The result is SUM( IF( `Country`='0', 1, 0 ) ) AS `GER`, SUM( IF( `Country`='1', 1, 0 ) ) AS `GER`.

 

Using Prepared statements (MySQL)

This is my solution:

SELECT GROUP_CONCAT(DISTINCT 'SUM( IF( `Country`=\'', @b = `Country`,'\', 1, 0 ) ) AS `', @b, '`' SEPARATOR ', ')
INTO @a from `Contracts`;
SET @a = CONCAT('SELECT `Agent`, ', @a, 'FROM `Contracts` GROUP BY `Agent`');
PREPARE s FROM @a;
EXECUTE s;
DEALLOCATE PREPARE s;
Following is a step-by-step explanation.

The only way to evaluate SQL statements from strings I know of is by using prepared statements. This is the simplified synthax:

PREPARE s FROM 'SELECT * FROM `Contracts`';
EXECUTE s;
DEALLOCATE PREPARE s;
The main purpose of prepared statements isn't exactly the ability to evaluate dynamically created SQL statements and that's the reason I'm using this feature in it's most plain variant.

Now that @a holds the SUM columns definitions (see "Generating the SUM() columns"") we can build the pivot SELECT:

SET @a = CONCAT('SELECT `Agent`, ', @a, 'FROM `Contracts` GROUP BY `Agent`');
And the rest is easy:
PREPARE s FROM @a;
EXECUTE s;
DEALLOCATE PREPARE s;

 

Using an external language (PHP)

Here's the solution:

$link = mysql_connect('SQL host', 'username', 'password');
$query = 'SELECT DISTINCT `Country` FROM `Contracts`;';
$result = mysql_query($query, $link);
$countries = '';
while ($row = mysql_result($result))
  $countries .= ",SUM( IF( `Country`='{$row[0]}', 1, 0 ) ) as `{$row[0]}` ";
mysql_free_result($result);
$query = 'SELECT `Agent`, ' . $countries . 'FROM `Contracts` GROUP BY `Agent`';
$result = mysql_query($query, $link);
Read below for a step-by-step explanation.

This method involves two requests to the MySQL server - one to get the country names, and the second to query the pivot table. Although any language can be used this example implements PHP.

First of all establish a MySQL connection:

$link = mysql_connect('SQL host', 'username', 'password');

Then get the countries:

$query = 'SELECT DISTINCT `Country` FROM `Contracts`;';
$result = mysql_query($query, $link);
$countries = '';
while ($row = mysql_result($result))
  $countries .= ",SUM( IF( `Country`='{$row[0]}', 1, 0 ) ) as `{$row[0]}` ";
mysql_free_result($result);
Instead of getting rows containing the plain country names we may take advantage of the previously reviewed methods to generate the columns definitions (see "Generating the SUM() columns"):
$query = "SELECT CONCAT(',SUM( IF( `Country`=\\'',`Country`,'\\', 1, 0 ) ) AS `',`Country`,'`') AS `Generated_Column` FROM `Contracts` GROUP BY `Country`;";
$result = mysql_query($query, $link);
$countries = '';
while ($row = mysql_result($result)) {
  $countries .= $countries;
}
...or even:
$query = "SELECT GROUP_CONCAT(DISTINCT 'SUM( IF( `Country`=\'', @b = `Country`,'\', 1, 0 ) ) AS `', @b, '`' SEPARATOR ', ') INTO @a from `Contracts`;";
$result = mysql_query($query, $link);
$row = mysql_result($result);
$countries = $row[0];

Now we need to build and send the main query, using the already prepared columns for countries:

$query = 'SELECT `Agent`, ' . $countries . 'FROM `Contracts` GROUP BY `Agent`';
$result = mysql_query($query, $link);
And we can display, save into file or do whatever we want with the result.

Do you have another solutions or questions? Please comment.

3 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. Code is not working

    ReplyDelete