Friday, June 18, 2010

Effective Regression Testing for OBIEE Applications

It's a dark and stormy Friday night, and as you drift off into slumberland in your warm, dry and cozy bed, a thought pops up in your mind: "If a butterfly flaps its wings in Costa Rica tonight, will the CFO's Executive Financial Summary dashboard still work?" Suddenly you are wide awake and feeling a bit of paranoia. You jump out of bed, fire up your laptop, log into the corporate VPN and then into OBIEE...

This sounds like the beginning of a bad BI horror story -- and hopefully for all of you a scenario like this truly is just that - a story. But we've all experienced some variation of this scene. The OBIEE equivalent of butterflies do flap their wings in Costa Rica, and the result does sometimes alter the OBIEE equivalent of weather patterns in Japan. Change the name of a presentation column... or the mapping rules of a logical column... or the aggregation rules of a logical table... or a minor extraction rule in the warehouse ETL logic... or even simply merge your local RPD work into a MUD repository... and you risk causing an error in a report that might not be noticed until days after the change -- usually when the report is most needed.

The real truth is that an OBIEE system just like any other software implementation has some inherent fragility that requires vigorous regression testing strategies to keep it running smoothly. But while most "traditional" software systems have very clearly defined behaviors that lend themselves well to control by means of various specific testing strategies, a Business Intelligence application by nature does not always necessarily have completely predictable outcomes - especially true in OBIEE given the complexity of the BI Server and, for example, how it constructs queries against the underlying data.

Given the complexity of the OBIEE architecture and the fleeting nature of the data, establishing an adequate regression testing strategy is a challenge.

One excellent approach is to identify commonly executed queries with Usage Tracking (or just identify a specific request's logical query), then execute the corresponding logical sql using nqcmd.exe. This is an excellent approach but can respond to only two results: success or failure. In most cases this is entirely adequate.

Another approach favored of course by Oracle is to implement their Application Testing Suite, which looks ike a promising tool but frankly I have not heard of any usage in the real world -- if anyone has direct experience with this tool, feel free to respond. Moreover it goes without saying that license fees would be a significant factor in evaluating its total ROI.

A third way to address regression testing in OBIEE is to set up a series of webcat (HTTP) requests that can be executed on a scheduled basis by an automated web testing utility. Several such utilities exist - LoadRunner being the most prevalent - but as with Oracle's offering, most require licensing.

One popular web testing utility distributed as open-source software and therefore free of licensing costs has gained some following among the developer community: a Firefox plugin called Selenium (the IDE flavor, to be precise), which can execute a wide range of web page interactions and combine them into test scripts. These test scripts can then be assembled into test suites and exported to a full-fledged program in a variety of commonly used languages/frameworks (C#, Java, Perl, PHP, Python, Ruby), at which point this program can be scheduled for execution just like any other.

While Selenium is highly flexible, fairly easy to use and very cheap (at least in terms of licensing costs), it also has some drawbacks, particularly when used to test an OBIEE system. In the hopes of blazing the "Selenium for OBIEE" trail, I have attempted my own small POC by writing some basic test scripts against the Paint webcat. In doing so I identified some gotchas and workarounds that will at the very least help you successfully construct OBIEE test scripts using this very capable tool. I will explore those workarounds and advanced configurations in later posts. But my overall opinion is that Selenium is a very capable testing tool which deserves serious consideration for use in an OBIEE environment.

Here are what I consider general strengths of Selenium:

Flexible
  • Can execute a variety of actions: open urls, click links, interact with prompts, execute javascript ...
  • Can be designed to respond to specified conditions either by failing OR by simply logging and continuing
  • Can assemble multiple test scripts within a single test suite
  • Can export scripts in a variety of languages / platforms - C#, Java, Perl, PHP, Python, Ruby
Straightforward to use
  • Scripts can be written using a simple Firefox plugin
  • Applies open standards: page elements are identified using Xpath
Economical
  • Open Source therefore no licensing
  • Fairly wide adoption therefore a decent user community
  • Commercial organizations exist that specialize in Selenium development & support
And some general challenges:

Somewhat steep learning curve
  • Though basic use is straightforwad, still takes a bit to get the hang of the tool
  • Because it's based on open standards like XPath, the skills learned are not "siloed" but can be useful elsewhere
  • Extending test suites for scheduled execution requires a leap of additional complexity
Open Source = No official support
  • On the other hand, there are plenty of answers to common problems on the web
  • Googling a solution is probably just as fast as relying on traditional paid support
OBIEE-Specific Challenges
  • Problematic handling of daughter or "target" windows
  • Default page element XPath queries cannot be used because they rely on IDs, which in OBIEE are dynamic

Friday, June 4, 2010

Session Variable Manipulation in Answers

Recently I worked on a project that used Non-System Session variables for some of the reports. While on the project I learned a nice trick on how to modify value of these variables in Answers to perform testing without having to jump out to a dashboard.

In this sample Answers report I'm using a Non-System Session variable called ReportDate to calculate the employees most recent department and compensation for head count purposes.
The report below shows the head count and compensation numbers as of May 31, 2010 since that is what I defaulted the session variable to in the initialization block.

Now I can modify the value of the ReportDate session variable by clicking on the "Advanced" tab at the top and then scrolling down to the "Prefix" portion of the page. Within the "Prefix" box I can enter the following syntax to change the date to January 1, 2009: SET VARIABLE ReportDate='01/01/2009';



Here is the new screen shot of the report showing the break out of employees for January 1, 2009.


If you need to modify multiple Non System Session Variable values simply delimit them with a comma like so...SET VARIABLE ReportDate='01/01/2009', SessionVariable2='Something', SessionVariable3='SomethingElse';

Thursday, June 3, 2010

OBIEE 11g launch in London July 7th, 2010

Well, it seems that OBIEE 11g will be officially launched in London on 7th July, YES this 2010 :)
Here is the invitation to such event in the Oracle site:

Launch Event: Introducing Oracle Business Intelligence Enterprise Edition 11g

Wednesday, June 2, 2010

BI Apps Performance Optimization and Consideration - Part 3

In this last part we will look at other options of optimizing the performance of OBIEE



Aggregation Strategy




  • Multi-level (pyramid) aggregation strategy: level 1 to level 2, level 2 built on level 1

  • Incremental aggregation for level 1 aggregates to reduce ETL time

  • Level 1 aggregates can be at the lowest leaf of each dimension to avoid bulk re-aggregation when dimension hierarchies change (e.g. sales force re-organization, product hierarchy changes)

  • Bulk aggregations for higher levels for simplicity

  • Data compression drives aggregation (at least 10 x, preferably closer to 100x). This could be implemented for Service Request tables

  • Try aggregation based on dimensions instead of factsAggregation to the ratio of 50:1

Caching Strategy




  • Proper caching strategy needs to be implemented as this is among the best practice of optimization . Caching must be implemented based upon users by groups

  • Set the physical table cache property on Mater table not aliases

  • Set cache persistence time based of refresh schedule

  • Ideally do not opt Cache never expires in the physical table properties

  • Caching logs may be monitored for user sessions and queries may be analyzed across various users to optimize

  • Use ODBC extension functions to purge cache which has the following advantages

  • Analytics Cache Purge ODBC Extension Functions

  • ETL routines can proactively call ODBC extension functions to manage cache. This ensures that obsolete cache is purged as soon as possible

  • Purging Cache by Logical Query is a new capability. This will be particularly helpful in purging cache associated with logical queries that are used in Cache Seeding.

  • Purging the entire cache set in one command is a new capability that can be used in various scenarios including development and testing environments.

  • SAPurgeCacheByQuery will purge a cache entry that exactly matches the logical query plan derived from a specified query. This function takes one, and only one, parameter representing query text. USE_ADVANCED_HIT_DETECTION. The default value is NO which will have the pre-existing cache hit detection behavior. The value YES will invoke the two pass algorithm which will examine a greater search space and is potentially more expensive than the default behavior. you should set the value to YES when you wish to maximize the number of cache hits they obtain

    Check the following Cache for scalability

Browser Cache
Scripts like Java or HTML
High resolution images
Data Access
Query Plan cache
Multithreaded Database Pools

BI Server Cache
Result Set
Aggregations
Summaries



Database Optimization



  1. Database archive strategy


  • The data in the databse needs to be archived from time to time in order to manage the data size. The idea is to keep only a certain amount of data in the warehouse and archive the rest of the data. For example, keep only the last 12 months of data in the warehouse, and archive older data from the warehouse.

  • This database archive strategy will help limit the number of rows across tables with huge data

2. Database Partition strategy



  • A good database partition strategy will help query performance since the query will only need to look at a specific partition to obtain the result. For example: The Organization table can be partition based on Active Flag ; The Service Request table can be partitioned based on X_CALC_FLG = 'Y’

3. Database Indexes



  • A database index should be added for the most queried columns. The database index should also be rebuild from time to time to ensure its effectiveness. ALTER INDEX MONITORING USAGE command that collects statistics over a period of time.

  • Increase Parallel Query by configuring the database server for an optimal degree of parallel processing.

  • Given sufficient resources, this can greatly improve query

  • Make sure that cost-based optimization is enabled on databases

  • Set the tablespace to at least the same as the transactional database size. Make sure the temporary tablespace has adequate space

  • Create histogram statistics for these indices to enable the optimizer to better perform queries on critical tasks

  • Spread data across multiple disks. RAID Redundant Array of Independent Disks configuration if possible is recommended

  • Partition Large fact tables like Asset, SR is recommended

  • Set the number of log file groups to 4 on Warehouse schema

  • Set the size of each log file to 10 MB. on warehouse db

  • Set the sga_max_size to 700 MB or more

  • On the client side, edit the tnsnames.ora file. Modify the TNS alias by adding SDU= and TDU=as follows:
    myhost_orcl.world=DESCRIPTION=(SDU=16384)(TDU=16384)
    ADDRESS = (PROTOCOL = TCP)(HOST=myhost)(PORT=1521)) CONNECT_DATA=(SID=ORCL))

  • On the Oracle client Set the NLS_SORT parameter to BINARY

  • The ETL implementation team should use analyze table commands specific to their database environment, that can be executed after the respective session has completed. This can be achieved by using the post-session command property of each session
    Try to break up ETL processing so data is extracted and staged remotely then compressed for further local processing

  • Remote ETL servers extract and stage to local flat file, compress and send the flat file to central ETL server

  • Central ETL Server performs the ‘stage to data warehouse’ process

  • Manage I/O Traffic --> Manage the input and output accesses to disk storage by striping the disk storage. The best and simplest action is to install disk storage arrays (RAID), the second best is to stripe volumes using a Logical Volume Manager.

  • De fragment tables to which the data is extracted

  • See how much memory is on the available server and set the values for SORT_AREA_SIZE and SORT_AREA_RETAINED_SIZE rather high. 20, 30 or 40 MB per session is not uncommon in a data warehouse situation. Also set HASH_AREA_SIZE rather large

Infrastructure Options



  • Consider deployment options on separate hardware for various OBI components like BI Publisher, OBI Server, OBI Presentation Services, Oracle Delivers Server

  • Hardware must be available based of the recommendations from Oracle. The hardware for database must be must also be setup based of the size of the database