Cross tabulation, Pivot table and OLAP Cube summarized by tools

Cross tab

Cross tabulation , is defined on WikiPedia as A cross tabulation (often abbreviated as cross tab) displays the joint distribution of two or more variables.

As below figures, the column values of the first table is transformed to column names in the second one, so the production of

 

 

Cross tabs are frequently used because:

  1. They are easy to understand. They appeal to people who do not want to use more sophisticated measures.
  2. They can be used with any level of data: nominal, ordinal, interval, or ratio - cross tabs treat all data as if it is nominal.
  3. A table can provide greater insight than single statistics.
  4. It solves the problem of empty or sparse cells.
  5. They are simple to conduct.

 

I think the cross-tab is really what the Relational Database means. That’s a meaningful “Relationships” between entities but not entity and it’s attributes.

Solution by SQL Server 2005

Code:

SELECT
    SalesPerson,
    [Oranges] AS Oranges,
    [Pickles] AS Pickles
FROM
    ( SELECT SalesPerson, Product, SalesAmount FROM tblSales ) ps
PIVOT
    (
      SUM (SalesAmount)
      FOR Product IN ( [Oranges], [Pickles])
    ) AS pvt

 

This most weak part of feature is that you must list all pivot column name manually. It’s pain.


BOL: Using PIVOT and UNPIVOT

SELECT < non-pivoted column>,
    [first pivoted column] AS ,
    [second pivoted column] AS ,
    ...
    [last pivoted column] AS
FROM
    (< SELECT query that produces the data>)
    AS
PIVOT
(
    < aggregation function>()
FOR
[< column that contains the values that will become column headers>]
    IN ( [first pivoted column], [second pivoted column],
    ... [last pivoted column])
) AS
< optional ORDER BY clause>;

Solution by MS Access

Microsoft Access provides a very nice cross table design wizard.

With the wizard, we get the cross-tab script based on JET-SQL.

TRANSFORM Sum(tblSales.SalesAmount) AS [Sum of SalesAmount]
SELECT tblSales.SalesPerson, Sum(tblSales.SalesAmount) AS [Total SalesAmount]
FROM tblSales
GROUP BY tblSales.SalesPerson
PIVOT tblSales.Product;

 

 

Pivot columns is automaticly generated and the aggregation over all pivot column can be produced. The JET-SQL is so smart and smoothly for small applications those don’t have to be scaled. You can use almost all VBA programming utilities in it. The TRANSFORM/PIVOT feature is powerful and more then 5 years before the SQL Server’s implementation.

It is based on JET-SQL, so not availible in an ADP project.

Solution by SQL Server 2000, CASE and GROUP

The classic solution of cross-tab in SQL Server 2000 or before is produced by CASE and GROUP.

select
    SalesPerson,
    SUM(CASE Product WHEN 'Pickles' THEN SalesAmount ELSE 0 END ) as [Pickles],
    SUM(CASE Product WHEN 'Oranges' THEN SalesAmount ELSE 0 END ) as [Oranges],
    SUM(SalesAmount)    as [Total Sales]
from
    tblSales
group by
    SalesPerson

We can find it’s footprint clearly in the SQL Server 2005’s pivot feature. And this is a base for other ideas.

Solution by SQL Server 2000, Dynamic SQL

Dynamic Cross-Tabs/Pivot Tables By Rob Volk

The basic idea is to generate dynamic CASE-and-GROUP based SQL statement and execute.

The following comments providing many improvement on this solution. i.e.

  • Introduce an WHERE statement as parameter for the @PivotColTable.
  • Using string to generate pivot columns but global temp table for better supporting of concurency calling.
  • Using user defined table with an user session column instead of global temp table.
  • To summarize multiple values.

There is also an similar solution I have not dived in, Dynamic Crosstab Queries, by Itzik Ben-Gan .

Solution by Excel

But the most easy and rich tool I have used is MS Excel’s pivot table.

Pivot table and OLAP Cube

The basic idea of OLAP Cube is a multi-dimession pivot table. The common visualization tool for OLAP is also a pivot table or pivot chart viewer. 

So, draw a figure to show basic concepts in OLAP by example metioned above as the END.

SQL Server开源免费工具

Buck Woody 在MSDN Blog上列出了他最中意的开源 SQL Server 工具软件,这些开源软件都是在微软的开源软件平台,CodePlex 上进行维护的。他介绍了五个软件。

  1. Web版维护工具,SQLWebTools,http://www.codeplex.com/SQLWebTools。基于Web的SQL Server维护工具,可以用在远程主机上,代替企业管理器(企业管理器不是免费的)。还有一个Php版的工具phpMSAdmin,与之类似。
  2. DBA经常要做的一件事,就是用脚本备份整个数据库结构。ScriptDB就是这样一个工具,http://www.codeplex.com/ScriptDB。这是一个用C#写成的命令行工具,可以将数据库结构以企业管理器中的层次结构备份到文件夹中。可以定期对数据库结构及其中的程序(存储过程、视图、函数、触发器等)做这样的快照,之后进行文本比较或者添加到版本管理工具中。
  3. 数据库比较工具, DbDiff, http://www.codeplex.com/DbDiff。这种工具是非常有用的工具,可是在这之前我还没有找到过真正免费的。DbDiff只能比较数据库结构,不能比较数据。RedGate提供一系列的工具很强大,价格也公道。

  4. SQL Server Express 管理工具,ExpressMaint, http://www.codeplex.com/ExpressMaint。SQL Server Express是免费的数据库服务器,有了各类辅助工具,就很值得一用。

  5. Simple SQL Server Dependencies, http://www.codeplex.com/SSSDependencies, 顾名思义,就是对SQL Server中的Object之间的关系进行分析的工具。

另外,在Del.icio.us中用sqlserver opensource tool这三个Tag ,可以发现下面的这些工具。Enjoy yourself.

  1. SqlDump - MS SQL database backup program

    SqlDump is a program to backup a Microsoft SQL Server database as a text file. SqlDump generates SQL statements for tables, indexes, user-defined types, views, procedures, etc and also the table data.

  2. SourceForge.net: DB Designer Fork

    DB Designer Fork is a fork of the fabFORCE DBDesigner 4. DBDesigner is a visual database design system that integrates entity relationship design and database creation. DB Designer Fork generates SQL scripts for Oracle, SQL Server, MySQL and FireBird.

  3. SQuirreL SQL Client Home Page

    SQuirreL SQL Client is a graphical Java program that will allow you to view the structure of a JDBC compliant database, browse the data in tables, issue SQL commands etc, see Introduction. The minimum version of Java supported is 1.5.x as of SQuirreL vers

  4. Query Express

    “Simple Query Analyzer look-alike, especially useful for MSDE and SQL Express. Also connects to Oracle and other OLE-DB compliant databases. Packaged as a single 100KB executable, i.e. more than 300 times smaller than SQL Server Management Studio.”

  5. Data Dictionary Creator

    Data Dictionary Creator (DDC) is a simple application which helps you document SQL Server databases. It stores all the information in Extended Properties, so it’s easier to keep the documentation in sync with the database as it changes.

  6. Data Modelling Tools

  7. SQL Manager: Database Management Tools for MySQL, SQL Server, PostgreSQL, InterBase, Firebird

    Database maintenance tools for popular relational database servers.

  8. |QueryCommander the sql editor|

    QueryCommander is a free sql editor wrapped in a Visual Studio type of environment. QueryCommander supports:

  9. SqlBI.eu

  10. SqlBulkTool 1.0

    SqlBulkTool is a command line utility that is used to quickly create a mirror of a database. It reads its configuration from an XML file containing source and destination command strings and a list of all the tables to mirror and then handles the work of

  11. TSQLUnit testing framework

  12. SchemaSpy

    Graphical Database Schema Metadata Browser

关系数据库中存储操作树形结构

关系数据库中树形结构的存储,决定因素是对于设计的系统,在此树形结构上的操作是什么。尤其对于节点数较大的树形关系,操作性能肯定是第一考虑因素。当然也和树的具体特典有关系,比如是“扁平型”的还是“纵深型”的。最近的工作涉及到一棵近百万个节点的树,一点总结稍作笔记。

树形结构常见操作

下面的操作大部分从MySQL文档中摘录。

  1. 遍历
  2. Finding all Child nodes
  3. Find the Immediate Subordinates of a Node
  4. Finding all the Leaf Nodes
  5. Depth of a Sub-Tree
  6. Aggregate Functions in a Nested Set
  7. Retrieving a Single Path (all ancestors nodes)
  8. Finding the Depth of the Nodes. 相当于7,找到所有父节点。
  9. Adding New Nodes
  10. Deleting Nodes

可简单分类为

  • 向下搜索(1,2,3,4,5,6)
  • 向上搜索(7,8)
  • 修改(9,10)

其中3和2,4等在不同数据结构基础上的实现中,非常不同。以下分别对几种常见的表示方法,作个简单总结。

线性表示

         A
       / | \
      B  C  D
        /|   \
       E  F   G

A(B, C(E, F), D(G,),)

相当于宽度优先的遍历。
“线性表示”的意思是说,在一行文本中,表示整个复杂结构。也许可以用在数据压缩上吧,基本上与这里讲述的关系数据库关系不大。

node-id & parent-node-id

最常见的表示方法,represents a hierarchy is by using parent child relationship
在关系数据库中,一行数据表示树中的一个节点(后面叙述的结构均是如此)。这一行中存储当前节点的ID(NodeId)和它的父节点的ID(ParentNodeId),因为每个节点的父节点是唯一的。在这两个ID上做索引。

此种结构最利于临近层次的节点搜索(常见操作之3)。对于节点的增删修改(常见操作之9,10),也很容易,只要修改单个节点的属性即可。

操作3(Find the Immediate Subordinates of a Node)在展示树型数据时很有用。比如动态展开节点。实际上很多树形控件,其数据结构也正是这样的。比如YUI的TreeView ;MSComctlLib.TreeCtrl;Graphviz;ASP.NET中的Tree view等等。

对于操作2,4,7,8,在SQL Server中可以使用Inline Table-valued FunctionsMultistatement Table-valued Functions 递归调用实现。当然性能上会有损失。

Deep-first Tree

Depth-First Traversal

A B C D E F G
node-id 1 2 3 4 5 6 7
most-right-child-id 7 2 5 4 5 7 7

Deep-first结构中,每条Node记录需要包含node-id和most-right-child-id。

常见操作2,4,7,8可以高效实现。但是对于操作3(immediately children),实现起来比较难,不如parent/child的容易、高效。可以结合使用。

–Get the path from the root to the given node:
SELECT t1.*
FROM employee t1, employee t2
WHERE t1.nodeindex <= i
AND t1.rightchildindex >= t2.rightchildindex
AND t2.nodeindex = i;

增删修改操作性能消耗比较大。

— Insert a node as a child of i: UPDATE employee
SET employee.nodeindex = employee.nodeindex + 1
WHERE employee.nodeindex > parentnode;

UPDATE employee
SET employee.rightchildindex = employee.rightchildindex + 1
WHERE employee.rightchildindex >= parentnode;

Random posts

  • C51中的函数指针
  • 晨雾回龙观
  • 抽象归纳数据结构,重用算法
  • 聚会照片
  • 小小白满月