September 5, 2008
Cross tabulation, Pivot table and OLAP Cube summarized by tools
Cross tab
Cross tabulation , is defined on WikiPedia as A cross tabulation (often abbreviated as cross tab) displays the joint distribution of two or more variables.
As below figures, the column values of the first table is transformed to column names in the second one, so the production of
Cross tabs are frequently used because:
- They are easy to understand. They appeal to people who do not want to use more sophisticated measures.
- They can be used with any level of data: nominal, ordinal, interval, or ratio - cross tabs treat all data as if it is nominal.
- A table can provide greater insight than single statistics.
- It solves the problem of empty or sparse cells.
- They are simple to conduct.
I think the cross-tab is really what the Relational Database means. That’s a meaningful “Relationships” between entities but not entity and it’s attributes.
Solution by SQL Server 2005
Code:
SELECT
SalesPerson,
[Oranges] AS Oranges,
[Pickles] AS Pickles
FROM
( SELECT SalesPerson, Product, SalesAmount FROM tblSales ) ps
PIVOT
(
SUM (SalesAmount)
FOR Product IN ( [Oranges], [Pickles])
) AS pvt
This most weak part of feature is that you must list all pivot column name manually. It’s pain.
SELECT < non-pivoted column>,
[first pivoted column] AS ,
[second pivoted column] AS ,
...
[last pivoted column] AS
FROM
(< SELECT query that produces the data>)
AS
PIVOT
(
< aggregation function>()
FOR
[< column that contains the values that will become column headers>]
IN ( [first pivoted column], [second pivoted column],
... [last pivoted column])
) AS
< optional ORDER BY clause>;
Solution by MS Access
Microsoft Access provides a very nice cross table design wizard.
With the wizard, we get the cross-tab script based on JET-SQL.
TRANSFORM Sum(tblSales.SalesAmount) AS [Sum of SalesAmount] SELECT tblSales.SalesPerson, Sum(tblSales.SalesAmount) AS [Total SalesAmount] FROM tblSales GROUP BY tblSales.SalesPerson PIVOT tblSales.Product;
Pivot columns is automaticly generated and the aggregation over all pivot column can be produced. The JET-SQL is so smart and smoothly for small applications those don’t have to be scaled. You can use almost all VBA programming utilities in it. The TRANSFORM/PIVOT feature is powerful and more then 5 years before the SQL Server’s implementation.
It is based on JET-SQL, so not availible in an ADP project.
Solution by SQL Server 2000, CASE and GROUP
The classic solution of cross-tab in SQL Server 2000 or before is produced by CASE and GROUP.
select
SalesPerson,
SUM(CASE Product WHEN 'Pickles' THEN SalesAmount ELSE 0 END ) as [Pickles],
SUM(CASE Product WHEN 'Oranges' THEN SalesAmount ELSE 0 END ) as [Oranges],
SUM(SalesAmount) as [Total Sales]
from
tblSales
group by
SalesPerson
We can find it’s footprint clearly in the SQL Server 2005’s pivot feature. And this is a base for other ideas.
Solution by SQL Server 2000, Dynamic SQL
Dynamic Cross-Tabs/Pivot Tables By Rob Volk
The basic idea is to generate dynamic CASE-and-GROUP based SQL statement and execute.
The following comments providing many improvement on this solution. i.e.
- Introduce an WHERE statement as parameter for the @PivotColTable.
- Using string to generate pivot columns but global temp table for better supporting of concurency calling.
- Using user defined table with an user session column instead of global temp table.
- To summarize multiple values.
There is also an similar solution I have not dived in, Dynamic Crosstab Queries, by Itzik Ben-Gan .
Solution by Excel
But the most easy and rich tool I have used is MS Excel’s pivot table.
Pivot table and OLAP Cube
The basic idea of OLAP Cube is a multi-dimession pivot table. The common visualization tool for OLAP is also a pivot table or pivot chart viewer.
So, draw a figure to show basic concepts in OLAP by example metioned above as the END.
Filed by
charlie
at 10:15 pm under BI, DBA, MS_ACCESS, datamining
No Comments
1 Comment