Saturday, November 28, 2009
Kimball vs. Inmon: The TDWI perspective
As it was explained to me, Kimball's "Bus Architecture" defines a data warehouse as the combination of all the data marts, which would make the data warehouse responsible for the intake, integration, distribution, delivery, and access of data. Comparatively, Inmon's "Hub and Spoke Architecture" defines the data warehouse as "a subject-oriented, integrated, non-volatile, time-variant collection of data organized to support management needs." Basically, it is responsible for the intake, integration, and distribution of the data. The argument proposed in the course was which was better to use?
To me, it seems this depends on the level of the BI program at the business in question. Is your business new to BI or finding their previous BI project poorly integrated? You could do well to subscribe to Kimball's approach, as it quickly allows the users to get what they need. However, as the BI program matures and more data marts are developed the issue of maintaining the "bus," the rules that define the conformed dimensions necessary for the architecture, become harder to keep aligned. This is where Inmon's approach makes sense, by pushing the conformity back to the warehouse, it is easier to administrate changes to rules and enforce integrity.
True, the line where it makes sense to implement the "hub" in the Inmon's architecture is hard to draw, but aside from that blurred area, I'm not sure there is much to arguing one as "better" than the other. To me it's simply a matter of BI Program Maturity.
Tuesday, November 11, 2008
Data Modeling Continued... Kimball vs. Inmon: The Basics
As I discussed last time, many in the field of BI are strongly sided with the methodology of either Ralph Kimball or Richard Inmon. As mentioned last week, there are more similarities than differences, but today I'll just point out the main differences between their philosophies for anyone unfamiliar with them.
The main difference is that Kimball's architecture, also known as the Bus Architecture, is based on loading individual data marts directly from the operational system through the data staging area using conformed dimensions. An operational data store or intermediate data structure may or may not be necessary depending on existing data sources and business requirements. In this design, what is referred to as
the data warehouse is actually just the collection of data marts. Kimball's basic architecture is shown in the diagram to the left. Inmon argues that this approach is inflexible without a centralized warehouse and changes cannot be made as gracefully as with his approach, which is explained below.
Inmon's Corporate Information Factory, or CIF architecture, is based on the idea that a complete data warehouse should be created in third normal form. Data marts are then created separately using the warehouse as their source. These data marts can be denormalized as the designers see fit, often into a star schema. This architecture is depicted in the diagram below.Those in Kimball's camp argue that the design, implementation, and maintenance of this data
warehouse, along with its associated additional ETL processes, are often unnecessary and take much more time to get off the ground than projects using the BUS archeticture.
The differences and arguments between these two approaches go far beyond what I've mentioned here, but this should help to explain the basic split between the methodologies. I've read many of the arguments for both sides out there, and although there are plenty of hard liners in both camps, the verdict seems to be that the answer to which architecture is better is....it depends. Yes, boring I know, but I've read many comments by designers claiming that they have either used hybrids or, used both successfully at different times depending on the existing architecture and business requirements.
For every opinion I've read advocating one or the other, I read another praising the merits of both. I also read one claiming that Richard (not Ralph) Kimball's methodology is superior, which made me laugh, because I made the same mistake once in conversation shortly after learning his name. My colleagues somehow seemed skeptical that the fictional character from the movie "The Fugitive" has his own data warehouse methodology.
Friday, October 31, 2008
Data Modeling
One of the most important aspects of a BI project is the underlying data warehouse model. This may seem like a no-brainer, right?In practice, I don't believe the data model is always given the proper time or expertise necessary for a successful implementation. Often, seemingly small flaws or shortcuts in the design of data warehouses or marts can cause much larger problems down the road in terms of functionality, performance, and flexibility. It's imperative that any organization faced with the task of designing their own data warehouse has a project team that understands the importance of a well-planned dimensional model. Changes can always be made, but often at an exponentially higher cost when issues are uncovered late in the game.
Anyone involved with a BI project at any level without a sound understanding of data warehouse modeling techniques should consider doing some research on the topic. One book in particular which I highly recommend is The Data Warehouse Toolkit, 2nd Ed. by Ralph Kimball and Margy Ross. If you're unfamiliar with Ralph Kimball, he's considered a pioneer in the field of data warehousing.

Another highly regarded data warehouse guru is BIll Inmon, who has also written several books on the subject. Most experts in the field strongly side with either Kimball or Inmon. There have been many arguments around whose philosophy on the subject is superior, but of course I'd hate to go picking a fight in my first post by endorsing one over the other here. Maybe we'll hit the topic of their fundamental differences next time. In actuality, their methodologies are very similar and have become more so over time. Anyone interested in learning more about warehouse design should pick up a book on the subject by either or both of them.
An understanding of their techniques and a well designed warehouse won't guarantee a successful implementation, but it's a great step in the right direction. Obviously, we'll never be able to foresee every issue to arise or every request thrown at us, but a strong warehouse design will allow us to deal with both much more easily.
