Wednesday, January 7, 2009

Oracle BI High Availability

I recently viewed an eSeminar on Oracle BI High Availability given by Oracle and thought I'd discuss the fundamentals of High Availability for anyone who's unfamiliar with the concept. High Availability, in the broadest possible terms, is a protocol for system design which will ensure a system is running acceptably for a certain percentage of a given time period.

Within OBIEE, availability can be defined basically as the ability to log into the system and perform normal operations at an acceptable and consistent level. This can be accomplished by employing system fault tolerance, which means that SPOF (single points of failure) must be eliminated. The goal with HA is to create a “shared nothing” environment in which any single box can temporarily fail without a major impact to users. During the eSeminar, it was explained that a general goal for a High Availability implementation might be 99.9% availability for a 24/7 system, although I'm sure service level agreements vary greatly from case to case. Using the "three nines" availability percentage, this would calculate to only 8.76 hours of downtime for an entire year, or about 43 minutes a month.


The above diagram should look familiar to most, this is a very simple representation of the OBIEE architecture showing the major components. A high availability deployment would have multiple instances of each of these objects in the case of an instance failing. An HA implementation will also use a clustered configuration for the BI server, including both a primary and secondary cluster controller. See the figure below which is another simplified look at the architecture with the redundant nodes added.


Notice that each of the objects or nodes is connected to multiple instances of every object it must talk with. I’ve left the catalog, repository, and scheduler database out of this diagram for simplicity’s sake, but each of these will be shared by its respective servers. You also may have noticed that the secondary cluster controller isn’t depicted here either, but should be included in any clustered HA setup. It is also possible for each presentation server to have its own copy of the presentation catalog, but due to the complicated setup and the difficulty with keeping the files in sync, the easier (and Oracle recommended) approach is to use a shared file system. Although the redundant web servers and their load balancer fall outside of the OBIEE scope, they are necessary to complete a true “shared nothing” environment all the way back to the user.

In future posts, I plan to drill into some of the details surrounding the configuration of the separate components of an HA deployment. We’ll be looking at some of the configuration file changes which will be necessary as well as exactly what types of impacts will be seen when specific failures do occur in an HA environment. Stay tuned for the next installment which will highlight the Presentation Services component….

A brief glimpse at iBots

What are iBots?

iBots are intelligence agents or Bots. These agents are triggered by a schedule or condition that in turn generates a request to perform analytics on data based upon defined criteria.

Where do iBots originate from?

Oracle BI Delivers is the application to create, modify and manage iBots.

How do iBots work?

In order to understand how an iBot works, we should start by first creating a new iBot from scratch.

Navigate to "Delivers" ......


.....and click the link to create a new iBot.



From here, we will tackle each individual tab to get an understanding of what the process is to create an iBot.

1) Overview tab

This view is a summary of the current settings for the selected iBot.

One can navigate to specific settings by clicking on the links in the summary or clicking on the tabs located at the top.

2) General tab


This is where the priority of the iBot and how to send the delivery content are set.

The priority options are low, normal, or high. The priority works with the delivery profile for a user to determine the destination for alerts of different priorities.

The following Data Visibility options affect the customization of the delivery content:

Personalized (individual data visibility)
:
- uses the data visibility of each recipient to customize iBot delivery content for each recipient. Note that The Run As field if not available.

Not personalized (use the Run As user's data visibility):
- sends the iBot's delivery content to the specified recipients. All users receive the same content as if they were the user specified in the Run As field.

Not personalized (use iBot owner’s data visibility):
- sends the iBot’s delivery content to the specified recipients using the data visibility of the specified user (Run As box field) who created the iBot. In other words, all recipients will receive the content that's viewable to the Run As user.


3) Conditional Request tab


This page is used to select a request to trigger the iBot. The results of the conditional request determine whether the iBot sends its delivery content and initiates any subsequent actions:

- If the request does not return any rows, the iBot is not triggered.

- If the request returns at least one row, the iBot sends its delivery content and initiates any subsequent actions.

Requests can be chained together to create complex conditional logic.

4) Schedule tab

iBots can be executed based on a specified schedule. You can define a starting date and time for the iBot, a recurrence schedule, and an ending date.

5) Recipients tab

Use this tab to select the users and groups to receive the delivery content of the iBot.

One has the ability to select specific individual users and groups to receive the contents of the iBot.
**You can select multiple users by holding down the SHIFT or CTRL keys and selecting adjacent or nonadjacent users.**

Also, one can select which users or groups can subscribe to the iBot. This allows users the flexibility to receive and view the contents of the iBot AND customize prompted filter values for columns.

Keep in mind that in order to publish iBots, the content must be shared.


6) Delivery Content tab

Use the Delivery Content tab to specify the type of content to deliver with the iBot, such as a
dashboard page or a saved request. The delivery format for the content can also be selected, such
as HTML, PDF, XLS, CSV, or text.

The contents of the Headline field will appear as the subject of the request.

The two field text boxes below allows one to attach a message when the iBot is delivered depending on if the iBot is delivered as an attachment or if there are no records returned by the conditional request.

7) Destinations tab

The User Destinations and Specific Devices portions of this tab are pretty self explanatory.

As for System Services...

Oracle BI Server Cache is used for seeding cache.

(** To have the cache created for individual users, the Personalized option for data visibility in the General tab must be selected.)

Disconnected Application Cache is only available to companies that have licensed Disconnected Analytics.

(** To have the disconnected application cache created for individual users, the Personalized(individual data visibility) option for Data Visibility in the General tab must be selected.)

(**Nonpersonalized data is not used for Disconnected Analytics users.)


8) Advanced tab


Use the Advanced tab to specify one or more actions to execute after the iBot is finished running.

Actions include the execution of other iBots, custom scripts, custom Java programs or Workflows.




For further reading on iBots, I would suggest downloading the Oracle Business Intelligence Answers, Delivers, and Interactive Dashboards User Guide to read further about creating iBots. What I provided is just a brief glimpse of iBots.

(http://download.oracle.com/docs/cd/B40078_02/doc/bi.1013/b31767.pdf)

Tuesday, December 30, 2008

JAVA SDK for Installing OBIEE 10.1.3.4 on AIX 6.1 64 bits

Recently (week of October 19 2008 to be precise), I was installing OBIEE 10.1.3.4 on AIX 6.1 64 bit and found out that the Java SDK Version requirement is very hard to fulfill. The minimum version requirement according to the documentation is 1.5.x, so I installed one and ran the UnixChk.sh and it gave thumbs up. After which I have begun the installation process. The GUI pops up; I do all the steps until where it verifies the components it is going to install. I click next it briefly (<1sec) shows the status bar and then the status bar in the installer window disappears but the installer window will remain open (I wish I made some screenshots). I let it be for more than an hour and had to kill it. I looked at the OracleBI and OracleBIData directories and they are empty –this is the case before and after I killed the process.

I then started the installation in console mode. Ran UnixChk.sh everything is fine as shown in the screenshot bellow

clip_image002

Then I started the installation process…

clip_image004

… and the wait is forever.

I then tried other versions of Java SDK and it did not make any difference.

After a while, found out on the metalink that this is a known bug (Bug 7389678: OBIEE 10.1.3.4 INSTALL HANGS ON AIX USING IBM JDK 1.5 - JAVA COMPATIBILITY ISSUE). Then started other set of problems…. First off there was no link to download the version of SDK that the engineering provided to the other client. Then while trying to raise a SR, neither the AIX 6 nor OBIEE 10.1.3.4 are shown under the respective drop downs. So was forced to choose AIX5.1 and 10.1.3.3, and this confused the support. Therefore, after a few calls and a couple of days later, we finally got THE SDK to successfully complete the installation. You can download the 64 bit Java 5 SDK for OBIEE here (Megaupload) or here (Rapidshare).

Monday, December 29, 2008

OBIEE and Virtual Private Database (VPD)

What is VPD?

Virtual Private Database is Oracle’s fine grain access control (FGAC) feature that was introduced in Oracle 8i. It helps control data level security on the database side by applying policies, thus data level security in the applications that read from the database is not necessary. The advantage is that if there are multiple applications accessing data from a database, it is not necessary to implement data level security in all those applications.

How does VPD work?

Policies are created in the database that would append a predicate (a WHERE clause) to the query in runtime. Consider a simple example – there exists a policy which would return only the rows attached to a particular user id on the table Orders. If a user “Kumar” were to query data from Orders table, Kumar would enter the following command:

Select * from Orders;

The policy that dictates what information a user can see would append a predicate to the query as follows:

Select * from Orders

where user_name = ‘KUMAR’;

This mechanism of appending the predicate is entirely transparent to the user.

Click here to read more about VPD on Oracle’s OTN.

Configuring VPD in OBIEE

To use the VPD feature in the Database and OBIEE along with its caching capabilities it is important to configure VPD in OBIEE. Failing to configure VPD in OBIEE while caching is enabled (in OBIEE), the request would bypass VPD policies by accessing data from cache and data level security will not be effectively handled by the database’s VPD. Thus, the users will see incorrect results.

To configure VPD in OBIEE, first enable the VPD option in the database’s general tab as shown:

clip_image002[7]

Then enable the “Security Sensitive” option in the security variable:

clip_image004[4]

Normal OBIEE Cache Behavior

To be simple and brief, if caching is enabled, a query that is being run for the first time would create a cache. Subsequent requests that is similar to the query or its subset would hit the cache to retrieve the results. This is true even if the users are different.

Example:

Logged on as Kumar Kambam

clip_image006[6]

Running a request…

clip_image008[6]

… generates the following Query log

clip_image010[6]

The cache is created….

clip_image012[6]

Now any user that issues a similar request or a subset of the request will hit the cache.

Logged on as Power User1

clip_image014[6]

Running a similar request, generates the following log. Notice that OBIEE server found a matching query in the cache that is created by Kumar.Kambam for the query issued by Poweruser1.

clip_image016[6]

clip_image018[6]

OBIEE Cache Behavior with VPD configured

When VPD option is configured in OBIEE, cache is created for each user even though a matching query exists in the cache. This ensures that the data retrieved for a user is not retrieved from the cache created by a different user, thereby ensuring the enforcement of VPD policies. In other words, if Kumar.Kambam were to run a query, the cache is created by the data visibility rules enforced by the VPD for Kumar.Kambam. If Poweruser1 runs a similar request it should bypass the query cache and hit the database to retrieve the data along the policies of the VPD for Poweruser1; if it were to hit the cache created by Kumar.Kambam, the results for Kumar Kambam will be presented to Poweruser1.

After configuring VPD, logged on as Kumar Kambam

clip_image019[4]

Running a query for the first time…

clip_image021[4]

…the following log is generated

clip_image023[4]

The cache is created

clip_image025[6]

Running the same query again, the following log is generated…

clip_image027[6]

OBIEE found a matching query in the cache and uses it.

clip_image029[6]

Now log on as Power User1

clip_image030[4]

By running the same request, the following log is generated…

clip_image032[4]

A new cache entry is created even though a similar request has been issued by a different user and a cache has been created for it

clip_image034[4]

The subsequent requests by Poweruser1 that is similar to the query will hit its own cache. This ensures that a user will only see his/her data.

CACHE MANAGEMENT

Databases are periodically refreshed with updated data due to ETL (Extract-Transform-Load) processes being executed. This will cause the information that is stored in the query cache to become outdated, or ‘stale’. OBIEE Administrators must enable a method of purging and refreshing the cache on a continuous basis.


IT TAKES MONEY TO MAKE CACHE

There is a cost associated with utilizing cache. This cost comes in the form of disk space for storage, purge transactions on the server, and administration. However, all the costs itemized above are easily outweighed by the decreased response times and increased performance gained by the application.


SOME CACHE IS WORTH MORE THAN OTHER CACHE

Not all data sets are treated the same. Although caching is set as a default within OBIEE, some queries may not be a suitable use of the cache.

Situations in which queries return an extra-large amount of data may not be ideal because the extra-large data set will also cause the cache file to be extra-large as well. The larger the cache, the less performance enhancement can be derived from the cache. A general rule-of-thumb is that a direct database hit should be sought with queries that will generate more than a 1GB cache entry.

Situations in which particular data elements must be refreshed frequently, or on a near real-time basis, may not be good candidates for a caching solution. Depending on your requirements, there may be a threshold for which the performance advantage of cache can be trumped by the frequency at which the cached data will need to be purged and refreshed.


OPEN YOUR CACHE PLAN

1. Make a decision to either (a) start with cache enabled for the application... or (b) use cache only as a peformance tuning tool following initial development.

2. Then.. develop a cache updating method

3. On an ongoing basis, monitor query requests to identify opportunities for improvement


CACHE METHODS

The important hurdle to overcome when implementing a cache solution is to eliminate the opportunity for data latency, or ‘stale’ data to exist. ‘Stale’ data will exist when the cached data is not purged after the ETL process has updated the data warehouse.

1. No Cache Method
2. Manually Administered Cache Method
3. Table Level Caching Method
4. Polling Table Reference Method


No Cache Method –

A new SQL query will call the database every time a request for data is generated from the users. This will greatly affect system performance and user productivity because of the increased network traffic and demands on the server.

The system will only be as good (or fast) as its weakest link. If the network connection is slow or the database is slow returning results… the users will feel this pain with slow response times.


Manually Administered Cache Method --

When connected in to a repository within online mode, a cache manager utility is available from Manage>Cache. Note that this option may be unavailable if you have disabled caching in the NQSConfig.ini file.

Manual cache management is best served as a useful utility during testing phases. It is generally not a dependable option for daily operations as it requires the ad-hoc purging of the cache by a user.


Polling Table Reference Method –

The system can identify when to refresh cached data with the use of a polling table, better known as an event table. The event table contains timing information about specific events that occur. When an ETL process is executed, an event table in the database is updated with an entry recording the details of a data table when that data table is updated by the ETL process. The BI Server can poll the event table and purge the data from the cache if a data table has been updated.

The frequency with which the BI Server checks the polling table can be set to coincide with that of the ETL so data from the more frequently updated tables is purged from the cache more often to avoid ‘stale’, or out-of-date’, data.

The polling table method can serve to be most useful where incremental ETL processes run during the day. An example of this would be to update transaction or sales data.

The polling table method is not as beneficial when the incremental ETL is run once a day or overnight.

Frequency settings for which the BI Server polls the event table is set in the OBIEE Administration tool, Tools>Utilities>Oracle BI Event Tables. Note that the parameters for the event table contain table names only and cannot contain an alias. This can lead to misleading results and an alternative purging strategy must be found for the alias.


Table Level Method –

The simplest description of this cache method is to manage caching on a table-by-table basis. All tables have caching enabled by default. The default amount of time data is left in cache (called persistence time) is infinite.

The best combination of performance improvement and storage space conservation can be achieved by deselecting tables from this process that are rarely queried. The modification of persistence time to coincide with the incremental ETL processes (similar to the Polling Table Reference Method) can also further increase performance.

Building Calculations in Oracle BI Pivot Tables

Recently, I ran into a situation with designing a report to appear exactly like an Excel spreadsheet that our client's finance department used for years. Using the pivot table view, I was able to mimic 90% of the Excel spreadsheet. The remaining 10% that remained incomplete was creating group subtotals and grand totals. A colleague of mine, Kevin McGinley, an individual known for his dashboard prowess, suggested creating the pivot table report by building out each row calculation using the option "New Calculated Item," which can be found by clicking on the More Options button in the column.



Following the advice of my colleague, I was able to mimic the Excel spreadsheet 100%.

To build a calculation for an item in an Oracle BI pivot table, you need to do the following:

1. In the Pages, Sections, Rows, or Columns area, click the More Options button for the measure on which you want a calculation performed.

2. Select the option New Calculated Item. The Calculated Item window appears.

3. Assign a name for the calculation in the Name field.

4. To build a formula, choose the Formula function.
  • NOTE: A formula creates a dynamic custom grouping within the pivot table. All measures references in a formula must be from the same logical column and must be present in the results. Formulas can be inserted into, or combined with, other calculations.
The mathematical operators become visible. The operators are shown in the following table.

+ Plus sign, for an addition operation in the formula.
- Minus sign, for a multiplication operation in the formula.
/ Divide by sign, for a division operation in the formula.
$ Dollar sign, for acting upon the row position of an item in a formula.
( Open parenthesis, to signify the beginning of a group operation in the formula.
) Close parenthesis, to signify the ending of a group operation in a formula.

  • In the Function field, build the formula by typing or clicking measure names, and clicking operators to insert them into the formula.
  • Use parentheses where appropriate.

5. When the calculation is complete, click Finished.

Keep in mind that this option isn't perfect. In fact, the main drawback is that each calculated row CANNOT be moved up and down the list of calculated items. If you've created 25 new calculated items and forgot to create an item that should be located at the 10th spot, then you'll have to recreate ALL of the calculated items after #10.

Another draw back that I have encountered is that if your column formulas contain calculations, those calculations may or may not work correctly. This example is more specific to the situation that I encountered and may not apply, but take notice.

The alternative to doing this is building union queries using "Combine with Similiar Request" option in Criteria. In my particular situation, union queries was not an option because there would be too many queries and performance would have suffered significantly.

Wednesday, December 10, 2008

Dashboard Navigation Tips

Some common designs for Dashboard navigation include:
  1. Spreading content across Dashboard pages and letting the user navigate to each page
  2. Linking Dashboards to specific reports
  3. Linking Dashboard pages to other Dashboard pages (sometimes to hidden pages in the same Dashboard)
These are just some of the options, but when deciding which is best, there are a few things to consider. First off, I prefer option 2 (linking Dashboards to Reports) as opposed to Option 3 (linking Dashboards to Dashboards), simply because OBIEE will automatically create a "Return" link on the target Report, whereas no such link will appear on a target Dashboard. This link allows the user to navigate back to the original Dashboard with all of the prompts set (i.e. the original source page contains the user's settings, not the defaults). I am not aware of any easy way to achieve this same return navigation for a target Dashboard. Of course the user could press the browser's back button to return to the original Dashboard, but this method seems to have unpredictable and sometimes incomplete results (for example, in some IE browsers users have experienced issues when using the back button whereby charts do not display or other data "freezes", so the back button is definitely not the preferred method).

Either way, when implementing option 2 or 3, I often use Navigation Targets (these are set under Column Heading Interaction or Value Interaction in the Column Properties). Navigation Targets are useful because they allow you to link from a source Dashboard/Report to another target Dashboard/Report and this type of navigation also passes prompt values from the source to the target. Passing prompt values is the default behavior and I am not aware of any way to turn this off; however, there is a nice trick to avoid this behavior if you do not want the source prompt values to override filters in the target - the trick is, for any prompted column that the source and target have in common and that should not have prompt values passed, simply create a duplicate presentation column with the same name (and logical mapping) and place it under a different presentation folder. For example, if my source Dashboard contains a prompted column called Year, and so does my target report, but I want my target report to always contain a hardcoded filter of Year = 2008, then simply create a duplicate Year column in the presentation layer and place it under a different presentation folder (sometimes I create a folder called "FOR DEVELOPER USE ONLY" and set its permissions to hide it from front end users). Set the source report to use the original Year column and the target report to use the duplicate Year column. Even though the column names are the same in the source and target, OBIEE will recognize these columns as distinct since they are in different folders, and therefore, the source prompt value will not override the target filter (and this is seemless to the end user).

One last tip - be careful with Navigation Targets because 1) I find that their behavior is not always intuitive to the end user, especially when linked from charts and 2) be aware that if you use Value Interaction with Navigation Targets, then the actual value that a user clicks on will be passed to the target as part of the filter criteria. If you simply want to provide a text link from a source to a target without passing prompt values, then you can use basic HTML and a hardcoded URL for the target report/Dashboard. However, if you want to provide a text link (as opposed to a Navigation Target embedded in a table/chart), and you DO want to pass prompt values, then you could actually accomplish this with another trick - you can still use Navigation Targets but just create a dummy report, with one real colum and one dummy column. Hide the real column and design the report so that it only returns one row. Also hide the column headings. Then create a Navigation Target under the Value Interaction for the dummy column. Set the dummy column to a constant, such as a description of the target. The final result which displays on the dashboard just resembles a regular text link but it contains the OBIEE built-in Navigation Target functionality which will link to the target and pass the prompt values, so you get the best of both worlds!