Showing posts with label Cache. Show all posts
Showing posts with label Cache. Show all posts

Monday, January 19, 2009

OBIEE Cache is enabled, but why is the query not cached?

Repeatedly customers pose the question – OBIEE cache is enabled, but why is the query not cached? The reason why the queries are not cache can be of many reasons. Some of the reasons are:

Non-cacheable SQL function: If a request contains certain SQL functions, OBIEE will not cache the query. The functions are CURRENT_TIMESTAMP, CURRENT_DATE, CURRENT_TIME, RAND, POPULATE. OBIEE will also not cache queries that contain parameter markers.
Non-cacheable Table: Physical tables in the OBIEE repository can be marked 'non-cacheable'. If a query makes a reference to a table that has been marked as non-cacheable, then the results are not cached even if all other tables are marked as cacheable.

clip_image002
Query got a cache hit: In general, if the query gets a cache hit on a previously cached query, then the results of the current query are not added to the cache. Note: The only exception is the query hits that are aggregate "roll-up" hits, will be added to the cache if the nqsconfig.ini parameter POPULATE_AGGREGATE_ROLLUP_HITS has been set to Yes.
Caching is not configured: Caching is not enabled in NQSConfig.ini file.

clip_image004
Result set too big: The query result set may have too many rows, or may consume too many bytes. The row-count limitation is controlled by the MAX_ROWS_PER_CACHE_ENTRY nqsconfig.ini parameter. The default is 100,000 rows. The query result set max-bytes is controlled by the MAX_CACHE_ENTRY_SIZE nqsconfig.ini parameter. The default value is 1 MB. Note: the 1MB default is fairly small. Data typically becomes "bigger" when it enters OBIEE. This is primarily due to Unicode expansion of strings (a 2x or 4x multiplier). In addition to Unicode expansion, rows also get wider due to : (1) column alignment (typically double-word alignment), (2) nullable column representation, and (3) pad bytes.

clip_image006
Bad cache configuration: This should be rare, but if the MAX_CACHE_ENTRY_SIZE parameter is bigger than the DATA_STORAGE_PATHS specified capacity, then nothing can possibly be added to the cache.

Query execution is cancelled: If the query is cancelled from the presentation server or if a timeout has occurred, cache is not created.

OBIEE Server is clustered: Only the queries that fall under “Cache Seeding” family are propagated throughout the cluster. Other queries are stored locally. If a query is generated using OBIEE Server node 1, the cache is created on OBIEE Server node 1 and is not propagated to OBIEE Server node 2

Monday, December 29, 2008

OBIEE and Virtual Private Database (VPD)

What is VPD?

Virtual Private Database is Oracle’s fine grain access control (FGAC) feature that was introduced in Oracle 8i. It helps control data level security on the database side by applying policies, thus data level security in the applications that read from the database is not necessary. The advantage is that if there are multiple applications accessing data from a database, it is not necessary to implement data level security in all those applications.

How does VPD work?

Policies are created in the database that would append a predicate (a WHERE clause) to the query in runtime. Consider a simple example – there exists a policy which would return only the rows attached to a particular user id on the table Orders. If a user “Kumar” were to query data from Orders table, Kumar would enter the following command:

Select * from Orders;

The policy that dictates what information a user can see would append a predicate to the query as follows:

Select * from Orders

where user_name = ‘KUMAR’;

This mechanism of appending the predicate is entirely transparent to the user.

Click here to read more about VPD on Oracle’s OTN.

Configuring VPD in OBIEE

To use the VPD feature in the Database and OBIEE along with its caching capabilities it is important to configure VPD in OBIEE. Failing to configure VPD in OBIEE while caching is enabled (in OBIEE), the request would bypass VPD policies by accessing data from cache and data level security will not be effectively handled by the database’s VPD. Thus, the users will see incorrect results.

To configure VPD in OBIEE, first enable the VPD option in the database’s general tab as shown:

clip_image002[7]

Then enable the “Security Sensitive” option in the security variable:

clip_image004[4]

Normal OBIEE Cache Behavior

To be simple and brief, if caching is enabled, a query that is being run for the first time would create a cache. Subsequent requests that is similar to the query or its subset would hit the cache to retrieve the results. This is true even if the users are different.

Example:

Logged on as Kumar Kambam

clip_image006[6]

Running a request…

clip_image008[6]

… generates the following Query log

clip_image010[6]

The cache is created….

clip_image012[6]

Now any user that issues a similar request or a subset of the request will hit the cache.

Logged on as Power User1

clip_image014[6]

Running a similar request, generates the following log. Notice that OBIEE server found a matching query in the cache that is created by Kumar.Kambam for the query issued by Poweruser1.

clip_image016[6]

clip_image018[6]

OBIEE Cache Behavior with VPD configured

When VPD option is configured in OBIEE, cache is created for each user even though a matching query exists in the cache. This ensures that the data retrieved for a user is not retrieved from the cache created by a different user, thereby ensuring the enforcement of VPD policies. In other words, if Kumar.Kambam were to run a query, the cache is created by the data visibility rules enforced by the VPD for Kumar.Kambam. If Poweruser1 runs a similar request it should bypass the query cache and hit the database to retrieve the data along the policies of the VPD for Poweruser1; if it were to hit the cache created by Kumar.Kambam, the results for Kumar Kambam will be presented to Poweruser1.

After configuring VPD, logged on as Kumar Kambam

clip_image019[4]

Running a query for the first time…

clip_image021[4]

…the following log is generated

clip_image023[4]

The cache is created

clip_image025[6]

Running the same query again, the following log is generated…

clip_image027[6]

OBIEE found a matching query in the cache and uses it.

clip_image029[6]

Now log on as Power User1

clip_image030[4]

By running the same request, the following log is generated…

clip_image032[4]

A new cache entry is created even though a similar request has been issued by a different user and a cache has been created for it

clip_image034[4]

The subsequent requests by Poweruser1 that is similar to the query will hit its own cache. This ensures that a user will only see his/her data.

CACHE MANAGEMENT

Databases are periodically refreshed with updated data due to ETL (Extract-Transform-Load) processes being executed. This will cause the information that is stored in the query cache to become outdated, or ‘stale’. OBIEE Administrators must enable a method of purging and refreshing the cache on a continuous basis.


IT TAKES MONEY TO MAKE CACHE

There is a cost associated with utilizing cache. This cost comes in the form of disk space for storage, purge transactions on the server, and administration. However, all the costs itemized above are easily outweighed by the decreased response times and increased performance gained by the application.


SOME CACHE IS WORTH MORE THAN OTHER CACHE

Not all data sets are treated the same. Although caching is set as a default within OBIEE, some queries may not be a suitable use of the cache.

Situations in which queries return an extra-large amount of data may not be ideal because the extra-large data set will also cause the cache file to be extra-large as well. The larger the cache, the less performance enhancement can be derived from the cache. A general rule-of-thumb is that a direct database hit should be sought with queries that will generate more than a 1GB cache entry.

Situations in which particular data elements must be refreshed frequently, or on a near real-time basis, may not be good candidates for a caching solution. Depending on your requirements, there may be a threshold for which the performance advantage of cache can be trumped by the frequency at which the cached data will need to be purged and refreshed.


OPEN YOUR CACHE PLAN

1. Make a decision to either (a) start with cache enabled for the application... or (b) use cache only as a peformance tuning tool following initial development.

2. Then.. develop a cache updating method

3. On an ongoing basis, monitor query requests to identify opportunities for improvement


CACHE METHODS

The important hurdle to overcome when implementing a cache solution is to eliminate the opportunity for data latency, or ‘stale’ data to exist. ‘Stale’ data will exist when the cached data is not purged after the ETL process has updated the data warehouse.

1. No Cache Method
2. Manually Administered Cache Method
3. Table Level Caching Method
4. Polling Table Reference Method


No Cache Method –

A new SQL query will call the database every time a request for data is generated from the users. This will greatly affect system performance and user productivity because of the increased network traffic and demands on the server.

The system will only be as good (or fast) as its weakest link. If the network connection is slow or the database is slow returning results… the users will feel this pain with slow response times.


Manually Administered Cache Method --

When connected in to a repository within online mode, a cache manager utility is available from Manage>Cache. Note that this option may be unavailable if you have disabled caching in the NQSConfig.ini file.

Manual cache management is best served as a useful utility during testing phases. It is generally not a dependable option for daily operations as it requires the ad-hoc purging of the cache by a user.


Polling Table Reference Method –

The system can identify when to refresh cached data with the use of a polling table, better known as an event table. The event table contains timing information about specific events that occur. When an ETL process is executed, an event table in the database is updated with an entry recording the details of a data table when that data table is updated by the ETL process. The BI Server can poll the event table and purge the data from the cache if a data table has been updated.

The frequency with which the BI Server checks the polling table can be set to coincide with that of the ETL so data from the more frequently updated tables is purged from the cache more often to avoid ‘stale’, or out-of-date’, data.

The polling table method can serve to be most useful where incremental ETL processes run during the day. An example of this would be to update transaction or sales data.

The polling table method is not as beneficial when the incremental ETL is run once a day or overnight.

Frequency settings for which the BI Server polls the event table is set in the OBIEE Administration tool, Tools>Utilities>Oracle BI Event Tables. Note that the parameters for the event table contain table names only and cannot contain an alias. This can lead to misleading results and an alternative purging strategy must be found for the alias.


Table Level Method –

The simplest description of this cache method is to manage caching on a table-by-table basis. All tables have caching enabled by default. The default amount of time data is left in cache (called persistence time) is infinite.

The best combination of performance improvement and storage space conservation can be achieved by deselecting tables from this process that are rarely queried. The modification of persistence time to coincide with the incremental ETL processes (similar to the Polling Table Reference Method) can also further increase performance.

Tuesday, October 7, 2008

Cache Basics

Part of any OBIEE Implementation should be spent defining the proper caching strategy specific to a given project. This post is a basic look at what cache means to OBIEE.

WHAT IS CACHE?
Pronounced "cash", it is generally an easily accessible place to store frequently used information. The most recognizable use of cache relates to the manner in which a personal computer utilizes this information. As it relates to PC, there are two basic types of cache: memory caching and disk caching.

Both types follow the same principle of storing recent or frequently used information in a memory buffer. Memory cache uses a portion of static RAM memory. Disk caching uses conventional main memory. When a piece of information is required, the system (or application) first checks the cache to see if the data is there.

The main purpose of cache is to improve performance. Accessing information stored in a memory buffer can be thousands of times faster than accessing a byte on a hard disk.

OBIEE CACHE
Caching as it relates to OBIEE is a bit different, but the same general purpose applies. Rather than utilizing memory buffer to quickly access recent information, OBIEE will access a file containing a stored record set of information rather than spend processing time querying the database upon each request.

When a request, or query, is made for a set of records, OBIEE can store this as cached information in a file to be used later. When a similar request is made for this same information, OBIEE will retrieve this information from the cache rather than spending processing time negotiating with the database. This can greatly improve the performance of the application.

In the OBIEE/Siebel Analytics world, this is referred to this as ‘query cache’. By utilizing query cache, the cost of database processing only needs to be paid for the one time the query is run against the database. Subsequent identical queries will run against the cache.