Why... Why... Why?
This blog is dedicated to documenting error resolution and other tidbits that I discover while working as a Consultant in the Oracle EPM (Hyperion) field. As much of my job revolves around issue resolution, I see an opportunity to supplement the typical troubleshooting avenues such as the Oracle Knowledgebase and Oracle Forums with more pinpointed information about specific errors as they are encountered. Beware, the information found in this blog is for informational purposes only and comes without any warranty or guarantee of accuracy.

Sunday, June 15, 2014

Fun with the Hyperion Log Analysis Utility


 I was recently playing around with the Log Analysis tool on 11.1.2.2.

John Goodwin has a good write up of the basic operation of the log analysis utility here:
http://john-goodwin.blogspot.com/2013/09/epm-standalone-log-analysis-utility.html

After reviewing the tool, I was curious how to make it a bit more useful.

First, John seems to be pinpointing specific product log directories with the "-d" option. I am curious if it is possible to get more generic and just have the utility tell us what is wrong without specifying the product. This would be beneficial if an error is occurring in the system and you just want a quick report of what error messages might be coming out to help troubleshoot.

I tried very generically:
loganalysis.bat -system -tmin 60  -d d:\Oracle\Middleware\user_projects

However, this seemed to only get the logs in user_projects\epmsystem1\diagnostics. It was missing the Weblogic logs under user_projects\domains. These logs are important because they provide insight into what is happening at the web application server layer. To get more of a complete picture of the system, it is possible to generate a report for both the domains folder and the diagnostics folder. 
loganalysis.bat -system -tmin 60  -d d:\Oracle\Middleware\user_projects\domains -o domains_report
loganalysis.bat -system -tmin 60  -d d:\Oracle\Middleware\\user_projects\epmsystem1\diagnostics -o diagnostics_report


Now this is on the right track. The next issue is that the utility is only scanning a single node. In a distributed environment, this doesn't help the problem of having to log into multiple nodes to gather the logs. Digging a little deeper, it looks like UNC paths work well in the Log Analysis utility.
For instance,
loganalysis.bat -system -tmin 60 -d \\node1\d$\Oracle\Middleware\user_projects\epmsystem1\diagnostics -o node1_diagnostics

Using this information, it is possible to orchestrate reports for all nodes in the environment at the same time from a single node. When each report finishes it opens a html report as a tab in Internet Explorer for review. When completed the results are easily viewed by scanning through the multiple tabs in IE.

The final script to run reports for all nodes at once might look something like this:

set SEARCH_IN_MINS=60

start cmd /C "loganalysis.bat -system -tmin %SEARCH_IN_MINS% -d \\node1\d$\Oracle\Middleware\user_projects\domains -o web1_domains"
start cmd /C "loganalysis.bat -system -tmin %SEARCH_IN_MINS% -d \\node1\d$\Oracle\Middleware\user_projects\epmsystem1\diagnostics -o web1_diagnostics"

start cmd /C "loganalysis.bat -system -tmin %SEARCH_IN_MINS%  -d \\node2\d$\Oracle\Middleware\user_projects\domains -o web2_domains"
start cmd /C "loganalysis.bat -system -tmin %SEARCH_IN_MINS%  -d \\node2\d$\Oracle\Middleware\user_projects\epmsystem1\diagnostics -o web2_diagnostics"

start cmd /C "loganalysis.bat -system -tmin %SEARCH_IN_MINS%  -d \\node3\d$\Oracle\Middleware\user_projects\epmsystem1\diagnostics -o planning1_diagnostics"
start cmd /C "loganalysis.bat -system -tmin %SEARCH_IN_MINS%  -d \\node3\d$\Oracle\Middleware\user_projects\domains -o planning1_domains"

start cmd /C "loganalysis.bat -system -tmin %SEARCH_IN_MINS%  -d \\node4\d$\Oracle\Middleware\user_projects\epmsystem1\diagnostics -o planning2_diagnostics"
start cmd /C "loganalysis.bat -system -tmin %SEARCH_IN_MINS%  -d \\node4\d$\Oracle\Middleware\user_projects\domains -o planning2_domains"

....

I think this is infinity more useful. Much more useful than the single node, single product example. Next time Hyperion is down and everyone is scrambling looking for answers, this script might just be a quick way to discover the issue.

Friday, May 30, 2014

Testing the FDM 32-Bit Database Connection

I recently had some issues with FDM and suspected an issue with the database client setup. However, both the 32 bit and 64 bit Oracle clients were installed on the machine to support both FDM (32 bit) and HFM (64 bit). 

Specifically, I wanted to test the Windows OLE DB Oracle Driver connection, rather than relying on sqlplus to ensure the OLE DB connection was working with the Oracle client. The 11.2.0.3 Oracle client installer has a known bug where it fails to correctly register OLE DB. However, if I was to just create a simple test .UDL file using the Oracle OLE DB provider it would default to using the 64 bit Oracle client. How could I test the 32 bit client using a UDL file when both the 32 and 64 bit Oracle client is installed?

This blog article was very helpful:
http://blogs.msdn.com/b/chaitanya_medikonduri/archive/2008/04/09/how-to-run-32-bit-udl-file-on-a-64-bit-operating-system.aspx

Ultimately, this command was able to successfully test the 32 bit Oracle Client using Oracle's OLE DB provider: 
C:\Windows\system32>C:\WINDOWS\SysWOW64\Rundll32.exe "C:\Program Files (x86)\Common Files\System\Ole DB\oledb32.dll",OpenDSLFile c:\Windows\Temp\test.udl32

This post is a tad bit "in the weeds". However, FDM provides very little indication of what is actually going on when it fails to open. This might just be useful to verify all aspects of connectivity when troubleshooting.

Saturday, April 26, 2014

Automating the Daunting Task of Hyperion Database Password Changes


Disclaimer: This is information is for advanced level administrators. Please use at your own risk. 

This post is about changing the underlying database passwords that form the foundation of the Hyperion EPM environment. In Hyperion nearly every product has a corresponding database schema/user, depending on how the environment is setup. In my experience clients infrequently change these database passwords, if ever. Perhaps this is just lax security or maybe it is due to the daunting efforts involved in password changes. I'd like to show a different approach which might help make the process easier.

First, let's look at the typical process for changing database passwords.
  1. Change the HSS/Registry password:
    On EACH node in your environment:
    Open config tool on each node and given that the connection to HSS now fails, it will prompt you to re-enter the database credential to HSS registry from scratch. You will then need to run the "Foundation -> Configure database" configure task.
  2. For each EPM module with a database connection
    For instance: CalcMgr, Planning, HFM..etc
    Run the config tool and run the database configuration task for each module. Note: when updating it will prompt you to also redeploy the application server to update the WebLogic passwords. You would need to enter "perform first time configuration of database", make certain the schema in the drop down list is pointing at the correct DB for this module, and then it will prompt "would you like to reuse or drop". You need to correctly select "reuse" or risk erasing all your data.
  3. Update product specific passwords. There are some passwords which do not fit the mold described in 2. These are things like .UDL files for HFM, FDM Applications, Planning Applications. Each of these components will need evaluated and handled differently. 
The tasks in 1 and 2 can be readily automated.  In my opinion the steps in 1 and 2 are very time consuming and potentially risk prone if you make a mistake. I really like to avoid using config tool and redeploying as much as possible due to its buggy behavior. Let's re-evaluate what happens in step 1 and 2 and try to automate. The third step is harder since it depends based on the application, and perhaps not worth trying to automate.
  1. Changing the HSS Registry password consists of two parts. There is a reg.properties file which simply contains the JDBC connection details and is used when opening config tool. If config tool cannot connect to the HSS database, it will prompt to re-enter the DB info for the purpose of recreating the reg.properties file.
    The second part is updating the HSS registry with the correct DB HSS/Registry password.

    Consequently, the password is stored in the file system and in the database.

    Automation:
    The encrypted value for reg.properties is not possible to create outside of using config tool. So the first node you will need to open config tool and re-enter the database credentials, and run the "Foundation -> configure database" task. This creates a new reg.properties on this node and also updates the HSS password in the database (HSS registry). For automation, do not re-run this on each node, just copy the updated reg.properties to each node:
    copy d:\Oracle\Middleware\user_projects\epmsystem1\config\foundation\11.1.2.0\reg.properties \\server2\d$\Oracle\Middleware\user_projects\epmsystem1\config\foundation\11.1.2.0\reg.properties copy d:\Oracle\Middleware\user_projects\epmsystem1\config\foundation\11.1.2.0\reg.properties \\server3\d$\Oracle\Middleware\user_projects\epmsystem1\config\foundation\11.1.2.0\reg.properties ...
  2. Step two is updating the passwords for each EPM module. Behind the scenes this step has two parts. a) Updates EPM module password in the EPM System Registry. b) Redeploying the Application Server updates the JDBC connections in WebLogic Essentially, Oracle\Middleware\user_projects\domains\EPMSystem\config\jdbc Automation for a) The epmsys_registry tool can be used to change the HSS Registry connections. You can find a list of all the database connections stored in the registry by running epmsys_registry without any arguments to generate a registry report. Then search for DATABASE_CONN to identify the database connections. To change from the command line you can use this syntax:
    set SYSREG=D:\Oracle\Middleware\user_projects\epmsystem1\bin\epmsys_registry

    call %SYSREG% updateencryptedproperty SYSTEM9/FOUNDATION_SERVICES_PRODUCT/EPMA_PRODUCT/DATABASE_CONN/@dbPassword NewPassword
    ... Continue for each database connection. Remember, you can skip the EPMRegistry connection since you already updated this in step 1. Automation for b) WebLogic has a scripting language, WLST, which can be used to script the password change in WebLogic. To find the list of datasources which need modfied login to the WebLogic Admin console and look under the JDBC connections. Note: the list of connections does not necessarily match those in part a, above.
    WLST Script

    connect('epm_admin','','t3://wl_domain:7001')
    edit()
    startEdit()
    pass_aif = encrypt('NewPassword')

    dsName = 'aif_datasource'
    cd('/JDBCSystemResources/'+dsName+'/JDBCResource/'+dsName+'/JDBCDriverParams/'+dsName)
    set('PasswordEncrypted', pass_aif)

    save()
    activate()
Conclusion: 
The steps shown here can be prepared and scripted before the maintenance. This dramatically eliminates the time it takes to perform the maintenance since the time consuming steps are prepared up front and not being performed on demand when the environment is already down.

Wednesday, October 23, 2013

RA Through Firewall

Another quick tip for implenting RA through a firewall.

The RA service by default uses a dynamic RMI port. To pin this port down for allowing through a firewall you can find the following documented:

http://docs.oracle.com/cd/E17236_01/epm.1112/epm_install_start_here_11121/frameset.htm?ch07s06s02.html

Remote ADM ServerDynamicEPM_ORACLE_HOME/common/ADM/11.1.2.0/lib/ADM.properties

Uncomment ADM_RMI_SERVER_PORT and specify value for the port opened in firewall.

Workspace (RA) Hangs when Implementing Firewall


Recently, I was implementing RA through a firewall. I discovered that Workspace would hang during login.

Looking through the logs I noticed some stuck threads in the RAFrameworkWeb logs,
which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:Thread-204 "[STUCK] ExecuteThread: '12' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, in native, suspended, priority=1, DAEMON> {    java.net.PlainSocketImpl.socketConnect(PlainSocketImpl.java:???)....java.lang.Class@10a184ba[thin lock]    com.sqribe.WS.WebSQRIBE.loginToRepository(Unknown Source)
This means something is unable to connect during RA initialization. 

Checking the Windows firewall for dropped packets (after configuring logging of dropped packets)

2013-10-10 23:15:43 DROP TCP xx.xx.xx.xx xx.xx.xx.xx 5613 32631 48 S 3865585991 0 8192 - - - RECEIVE


Apparently, some higher level ports are being blocked... What is process is being blocked?
I looked further at the netstat output to see what ports were LISTENING on the node, 

Netstat -a -n -o 
  TCP    0.0.0.0:6801           0.0.0.0:0              LISTENING       9256
  TCP    0.0.0.0:6802           0.0.0.0:0              LISTENING       9256
  TCP    0.0.0.0:6803           0.0.0.0:0              LISTENING       9256
  TCP    0.0.0.0:6804           0.0.0.0:0              LISTENING       9256
  TCP    0.0.0.0:6805           0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30298   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30332   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30365   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30400   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30437   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30472   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30507   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30540   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30575   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30611   0.0.0.0:0              LISTENING       9256

  TCP    xx.xx.xx.xx:30646   0.0.0.0:0              LISTENING       9256

There are a ton of 30k+ ports that that RA is listening on. Obviously, these are not being allowed through the firewall. The 9256 column is the PID, but also the common 6800 , 6801 ports identify this as the RA Framework Services. What is causing these higher number ports? 

Finally, after some digging in the KB,
How to Change BI+ EPM Workspace to No Longer Use Dynamic Port Allocation for Logins to Properly Configure the Firewall to Allow Connections. (Doc ID 1483777.1)

Hyperion BI+ - Version 11.1.2.0.00 to 11.1.2.2.000 [Release 11.1]

How to change BI+ EPM Workspace to no longer use dynamic port allocation for logins so you can properly configure the firewall to allow connections.

FIX

 1. Login to Workspace click  Navigate -> Administer -> R&A -> Web Applications
      Right click on Reporting and Analysis Framework Web-Application -> Select Properties -> Internal
        Change Pass Data Using Streams Instead Of Files to No -> click OK

 2. Restart RaFramework Web


Also, there is a small blurb in the EPM doc about this if you look carefully,

Sunday, July 7, 2013

Bug in 11.2.0.3 Oracle Client Installation Prevents HFM Configuration


The 11.2.0.3 version of the Oracle Client on Windows seems to have a unique bug where it has trouble registering the OLE DB provider in windows. This effects both HFM and FDM database configuration tasks. First, HFM will not pass configuring the db in the config tool. Second, FDM will not be able to create an application - throwing a very generic message saying check your database connection.

This essentially means that if you create a new UDL file manually, and look at the providers list, the Oracle Client provider is not shown.

The easiest way to see this issue, as mentioned above is to create a new text file, with zero bytes. Then rename it to test.udl. When you double click it should let you test the OLE DB connection. This is a good way to test any HFM DB related connection issue. However, if this bug exists, you will quickly discover that the Oracle provider is missing.

Please refer to this KB for more info:

Installing the 11.2.0.3 Oracle Provider for OLE DB from the Client Install Media Does Not Properly Register the Provider [ID 1380742.1]

For reference, the 64 bit (HFM) command to register the OLE DB would be

D:\>regsvr32 d:\oracle\product\11.2.0\client_2\bin\OraOLEDB11.dll

remember to refer to the KB article for the 32 bit command for FDM to work too.


Here are some more symptoms of the issue:
UDL: "Test connection failed because of an error in setting the window handle property. Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done."



From Hyperion Configtool.log (Error configuring HFM database connection):

[SRC_CLASS: com.hyperion.hfm.config.hfmregistry.HFMRegistry] Error while configuring HFM Database: {0}[[
java.lang.RuntimeException: Oracle DB registry entry is not available
at com.hyperion.hfm.config.hfmregistry.HFMRegistry.configureHFMDataBase(HFMRegistry.java:754)
at com.hyperion.hfm.config.HfmDbConfigurator.configure(HfmDbConfigurator.java:150)
at com.hyperion.config.wizard.impl.RunAllTasks.executeDbConfigTask(RunAllTasks.java:929)
at com.hyperion.config.wizard.impl.RunAllTasks.execute(RunAllTasks.java:482)
at com.hyperion.config.wizard.impl.RunnAllTasksState.run(RunnAllTasksState.java:91)
at java.lang.Thread.run(Thread.java:662)

]]


Additionally, this post is also talking about the same issue, though perhaps not the best solution:

Friday, April 26, 2013

Error Deploying Applications in Config Tool

When trying to deploy to application server, config tool crashes with "An unexpected error occurred and the application will have to shut down. Review the error log, correct the error condition and re-launch the application."

Inline image 1



The first thing to check is the config tool log. Digging into the log reveals,
[SRC_CLASS: com.hyperion.hit.wizard.Wizard] trace: Unexpected error:[[
java.lang.NullPointerException
at com.hyperion.hit.tool.deploy.utils.UIPortUtil.resetData(UIPortUtil.java:183)
at com.hyperion.hit.tool.deploy.utils.UIPortUtil.<init>(UIPortUtil.java:65)
at com.hyperion.hit.tool.deploy.utils.UIPortUtil.getInstance(UIPortUtil.java:48)
at com.hyperion.config.wizard.DeploymentField.getAppServerDeploymentData(DeploymentField.java:309)
at com.hyperion.config.wizard.DeploymentField.<init>(DeploymentField.java:70)
at com.hyperion.config.wizard.impl.DeploymentForm.<init>(DeploymentForm.java:57)
at com.hyperion.config.wizard.impl.DeploymentState.initializeBeforeShowOnNext(DeploymentState.java:58)
at com.hyperion.hit.wizard.Wizard$NextClass.run(Wizard.java:551)
at java.lang.Thread.run(Thread.java:662)
]]

Upon further investigation - this logic is basically trying to enumerate the logical web applications for each host. While iterating through the logical web applications, the config tool encounters a null value and crashes. The logical web applications are found in the Shared Services Registry. So I ran the registry-cleanup utility,

NOTE: The registry-cleanup utility has a high probability of corrupting your environment. Take a database backup of your Shared Services/Foundation schema before starting and be prepared to take downtime to restore the schema if required.
Essentially, the cleanup utility found some invalid entries to delete - so I removed them one at a time until the config tool worked again.

registry-cleanup.bat
Launching the Hyperion Registry Cleanup Utility, please wait...
Enter the password for Shared Services Database :
Reading rules XML file: \Oracle\Middleware\EPMSystem11R1\common\config\11.1.2.
0\resources\registry\cleanup\default-rules.xml
Found 6 rules

Looking for components matching rule: Remove components without parent HOST node
Processing results for rule: Remove components without parent HOST node

Do you really want to delete WORKSPACE_WEBAPP(WORKSPACE_WEBAPP):808d62cd746781a0
S201b96e113e13916fd3S6ae7 component?
Y - Yes
A - Yes, for all components matching this rule
N - No, skip this component
S - No, skip this and all the next components matching this rule
I - Show full component info
y
Deleting: WORKSPACE_WEBAPP(WORKSPACE_WEBAPP):808d62cd746781a0S201b96e113e13916fd
3S6ae7


This fixed the error in config tool.