Showing posts with label Data modeling. Show all posts
Showing posts with label Data modeling. Show all posts

Tuesday, November 11, 2008

Data Modeling Continued... Kimball vs. Inmon: The Basics

As I discussed last time, many in the field of BI are strongly sided with the methodology of either Ralph Kimball or Richard Inmon. As mentioned last week, there are more similarities than differences, but today I'll just point out the main differences between their philosophies for anyone unfamiliar with them.

The main difference is that Kimball's architecture, also known as the Bus Architecture, is based on loading individual data marts directly from the operational system through the data staging area using conformed dimensions. An operational data store or intermediate data structure may or may not be necessary depending on existing data sources and business requirements. In this design, what is referred to as the data warehouse is actually just the collection of data marts. Kimball's basic architecture is shown in the diagram to the left. Inmon argues that this approach is inflexible without a centralized warehouse and changes cannot be made as gracefully as with his approach, which is explained below.

Inmon's Corporate Information Factory, or CIF architecture, is based on the idea that a complete data warehouse should be created in third normal form. Data marts are then created separately using the warehouse as their source. These data marts can be denormalized as the designers see fit, often into a star schema. This architecture is depicted in the diagram below.Those in Kimball's camp argue that the design, implementation, and maintenance of this data warehouse, along with its associated additional ETL processes, are often unnecessary and take much more time to get off the ground than projects using the BUS archeticture.

The differences and arguments between these two approaches go far beyond what I've mentioned here, but this should help to explain the basic split between the methodologies. I've read many of the arguments for both sides out there, and although there are plenty of hard liners in both camps, the verdict seems to be that the answer to which architecture is better is....it depends. Yes, boring I know, but I've read many comments by designers claiming that they have either used hybrids or, used both successfully at different times depending on the existing architecture and business requirements.

For every opinion I've read advocating one or the other, I read another praising the merits of both. I also read one claiming that Richard (not Ralph) Kimball's methodology is superior, which made me laugh, because I made the same mistake once in conversation shortly after learning his name. My colleagues somehow seemed skeptical that the fictional character from the movie "The Fugitive" has his own data warehouse methodology.

Hopefully this helped explain the main differences to anyone new to their methodologies. As I mentioned in the preceding post, I encourage anyone involved with a BI project at any level to pick up a book by both men to fully understand their ideas. Also, although this topic has already been argued at length, I encourage comments from anyone with significant experience using either or both methodologies.

Friday, October 31, 2008

Data Modeling

One of the most important aspects of a BI project is the underlying data warehouse model. This may seem like a no-brainer, right?

In practice, I don't believe the data model is always given the proper time or expertise necessary for a successful implementation. Often, seemingly small flaws or shortcuts in the design of data warehouses or marts can cause much larger problems down the road in terms of functionality, performance, and flexibility. It's imperative that any organization faced with the task of designing their own data warehouse has a project team that understands the importance of a well-planned dimensional model. Changes can always be made, but often at an exponentially higher cost when issues are uncovered late in the game.

Anyone involved with a BI project at any level without a sound understanding of data warehouse modeling techniques should consider doing some research on the topic. One book in particular which I highly recommend is The Data Warehouse Toolkit, 2nd Ed. by Ralph Kimball and Margy Ross. If you're unfamiliar with Ralph Kimball, he's considered a pioneer in the field of data warehousing.


Another highly regarded data warehouse guru is BIll Inmon, who has also written several books on the subject. Most experts in the field strongly side with either Kimball or Inmon. There have been many arguments around whose philosophy on the subject is superior, but of course I'd hate to go picking a fight in my first post by endorsing one over the other here. Maybe we'll hit the topic of their fundamental differences next time. In actuality, their methodologies are very similar and have become more so over time. Anyone interested in learning more about warehouse design should pick up a book on the subject by either or both of them.

An understanding of their techniques and a well designed warehouse won't guarantee a successful implementation, but it's a great step in the right direction. Obviously, we'll never be able to foresee every issue to arise or every request thrown at us, but a strong warehouse design will allow us to deal with both much more easily.