Why... Why... Why?
This blog is dedicated to documenting error resolution and other tidbits that I discover while working as a Consultant in the Oracle EPM (Hyperion) field. As much of my job revolves around issue resolution, I see an opportunity to supplement the typical troubleshooting avenues such as the Oracle Knowledgebase and Oracle Forums with more pinpointed information about specific errors as they are encountered. Beware, the information found in this blog is for informational purposes only and comes without any warranty or guarantee of accuracy.

EPMVirt: Create your own Oracle Hyperion Virtual Environment:

Sunday, March 29, 2015

Handling WebLogic Failures


The default behavior when a critical error occurs in a running WebLogic application is to do nothing about the failure condition. This can cause the web server, OHS, to keep routing traffic to the service, even in a highly available situation where there is more than one server available. This situation leaves many with HA setups vulnerable to a single point of failure.

The basic problem is that when you have a WebLogic service running, such as HFM, and it encounters a critical situation such as out of memory (OOM) error, the service comes to a halt and cannot process users requests. End users are left with their requests spinning in the web browser until they eventually timeout. It is important to configure the WebLogic Managed Server to acknowledge the failure and shut itself down quickly so that the other member of the cluster can take over.

Two settings in WebLogic can help with this, under the theme Overload protection. The settings are Failure Action and Panic Action.

  

This can be scripted to set for all managed servers using WLST
connect(....)  
edit() 
startEdit()
servers = cmo.getServers()
for server in servers:
  name=server.getName()
               cd('OverloadProtection/' + name)
      cmo.setPanicAction('system-exit')               
      cmo.setFailureAction('force-shutdown')
    save()
    activate()



More Details about overload protection can be found here:
http://www.dba-oracle.com/t_weblogic_overload_protection.htm

Taking this scenario one step further, rather than just failing, Node Manager can be used automatically restart the failed service. However, Node Manager is most often used as an optional component, so you would need to decide whether to implement Node Manager to use these features.




Wednesday, March 25, 2015

Useful Administrative Tools in Windows

Do you work with Hyperion in Windows and feel like your hands are tied? Like a fish out of water? OK, maybe this is an exaggeration for most admins, but as a former Linux sysadmin it is how I feel most days. I wanted to share some tools that can be used to help bridge the gap and become a Windows power user as it relates to Hyperion. Please feel free to share your own tools and ideas in the comments.

Note: Each of these tools has its own licencing agreement. Please carefully read and adhere to each licence. 

AstroGrep

Unix equivalent: grep
Find strings in files. AstroGrep is much more powerful than windows find utility. The Windows find utility only searches for whole words. For example, "this is an exception" vs. "java.lang.exception". The Windows find utility only finds the first example, yet AstroGrep finds both strings.

If Hyperion throws a weird message or error, you can use AstroGrep to search the file system for the file to help pinpoint where it is coming from and get more context around the error. It can be useful to find specific server names, passwords, usernames, etc in files such as MAXL scripts.

In the example below I am searching the *.log files in the EPM Diagnostics folder for the string "Exception".

WinDiff

Unix equivalent: diff

Diff files, or recursively entire directories for changes. For instance, compare the EPMA accounts from an LCM export taken at two different times to identify the differences in EPMA metadata.

mTAIL

Unix equivalent: tail -f
Windows error log tailer. It is used to monitor logs in real time. Simply drag and drop a file from windows explorer into mTAIL, click the start button and it will start displaying updates to the log file in real time. This is handy when you need to check a file over and over for updates, such as looking for new errors. It is also useful if you are constantly restarting WebLogic to check log for errors during startup.



Process Explorer and Handle

Unix equivalent: lsof
Ever try to rename or delete that certain folder and it keeps saying it is in use, but you do not see anything open? Ever have trouble with Hyperion patches failing to apply because some files in use? Obviously, the first step is to search task manager for Hyperion related processes. However, there are some pesky situations where this alone cannot identify the culprit. Process Explorer has an option to search file handles, and is easier to use. The Handle tool is a command line tool and kind of archaic, but can also find what is using the files you want to get access to.

WinDirStat

Unix equivalent: du -k | sort -n
Out of disk space? Need a quick tool to discover the where the most space is being consumed? WinDirStat gives you an ordered list of disk space consumption and lets you drill through, finding what is using all the disk space on your system.


mRemoteNG

  Sadly, the opensource form of this tool is deprecated. It it a very useful remote desktop/SSH session manager for Windows. Integrates nicely with PuTTY.

Process Monitor

Unix equivalent: strace
A very low level tool to see the system calls a process is making while executing. Typically this tool comes out when all of the normal avenues have been tried to solve the problem. Sometimes getting down to the level of system calls can pinpoint things like missing files, bad security, and bugs. This helps figure out why a process is crashing by seeing what unexpected conditions the process encountered while running.

Internet Explorer Developer tools

Similar tools also available in Chrome and Firefox

Network profiling can help find latency in your requests. For instance, you can measure the time it takes to get resources from different geographic regions and pinpoint certain latency issues. From the example below, there are a ton of images and background activity after each click in Hyperion. Sometimes slow performance can be found on the front-end using this method. It is also useful to inspect in detail certain processes for debugging purposes.

Simply go to Tools -> F12 Developer Tools. Select the Network tab, and "start capturing".