Why... Why... Why?
This blog is dedicated to documenting error resolution and other tidbits that I discover while working as a Consultant in the Oracle EPM (Hyperion) field. As much of my job revolves around issue resolution, I see an opportunity to supplement the typical troubleshooting avenues such as the Oracle Knowledgebase and Oracle Forums with more pinpointed information about specific errors as they are encountered. Beware, the information found in this blog is for informational purposes only and comes without any warranty or guarantee of accuracy.

Sunday, August 3, 2014

Windows WebLogic Servies Fail to Start - Trouble Encrypting boot.properties

I have run across this error a few times and have finally gotten around to documenting it. The issue occurs typically after a multi node Hyperion installation and all the WebLogic managed servers not residing on the WebLogic domain fail to start up. Essentially, you start the WebLogic service, and after a minute it shuts down. Looking into the WebLogic managed server log files: 


java.lang.AssertionError: java.lang.reflect.InvocationTargetException
at weblogic.descriptor.DescriptorManager$SecurityServiceImpl$SecurityProxy._invokeServiceMethod(DescriptorManager.java:175)
at weblogic.descriptor.DescriptorManager$SecurityServiceImpl$SecurityProxy.decrypt(DescriptorManager.java:192)
at weblogic.descriptor.DescriptorManager$SecurityServiceImpl.decrypt(DescriptorManager.java:114)
at weblogic.descriptor.internal.AbstractDescriptorBean._decrypt(AbstractDescriptorBean.java:1092)
at weblogic.management.configuration.SecurityConfigurationMBeanImpl.getCredential(SecurityConfigurationMBeanImpl.java:737)
at weblogic.security.internal.ServerPrincipalValidatorImpl.getSecret(ServerPrincipalValidatorImpl.java:88)
at weblogic.security.internal.ServerPrincipalValidatorImpl.sign(ServerPrincipalValidatorImpl.java:67)
at weblogic.security.service.PrivilegedActions$SignPrincipalAction.run(PrivilegedActions.java:62)
at weblogic.security.service.SecurityServiceManager.createServerID(SecurityServiceManager.java:1098)
at weblogic.security.service.SecurityServiceManager.getServerID(SecurityServiceManager.java:1109)
at weblogic.security.service.SecurityServiceManager.sendASToWire(SecurityServiceManager.java:600)
at weblogic.server.channels.ChannelService.resetQOS(ChannelService.java:296)
at weblogic.server.channels.ChannelService.start(ChannelService.java:258)
at weblogic.t3.srvr.SubsystemRequest.run(SubsystemRequest.java:64)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:256)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:221)

Caused By: weblogic.security.internal.encryption.EncryptionServiceException
at weblogic.security.internal.encryption.JSafeEncryptionServiceImpl.decryptBytes(JSafeEncryptionServiceImpl.java:139)
at weblogic.security.internal.encryption.JSafeEncryptionServiceImpl.decryptString(JSafeEncryptionServiceImpl.java:187)
at weblogic.security.internal.encryption.ClearOrEncryptedService.decrypt(ClearOrEncryptedService.java:96)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at weblogic.descriptor.DescriptorManager$SecurityServiceImpl$SecurityProxy._invokeServiceMethod(DescriptorManager.java:173)
at weblogic.descriptor.DescriptorManager$SecurityServiceImpl$SecurityProxy.decrypt(DescriptorManager.java:192)
at weblogic.descriptor.DescriptorManager$SecurityServiceImpl.decrypt(DescriptorManager.java:114)
at weblogic.descriptor.internal.AbstractDescriptorBean._decrypt(AbstractDescriptorBean.java:1092)
at weblogic.management.configuration.SecurityConfigurationMBeanImpl.getCredential(SecurityConfigurationMBeanImpl.java:737)
at weblogic.security.internal.ServerPrincipalValidatorImpl.getSecret(ServerPrincipalValidatorImpl.java:88)
at weblogic.security.internal.ServerPrincipalValidatorImpl.sign(ServerPrincipalValidatorImpl.java:67)
at weblogic.security.service.PrivilegedActions$SignPrincipalAction.run(PrivilegedActions.java:63)
at weblogic.security.service.SecurityServiceManager.createServerID(SecurityServiceManager.java:1098)
at weblogic.security.service.SecurityServiceManager.getServerID(SecurityServiceManager.java:1109)
at weblogic.security.service.SecurityServiceManager.sendASToWire(SecurityServiceManager.java:600)
at weblogic.server.channels.ChannelService.resetQOS(ChannelService.java:296)
at weblogic.server.channels.ChannelService.start(ChannelService.java:258)
at weblogic.t3.srvr.SubsystemRequest.run(SubsystemRequest.java:64)
at weblogic.work.ExecuteThread.execute(ExecuteThread.java:256)
at weblogic.work.ExecuteThread.run(ExecuteThread.java:221)


WebLogic is trying to read the boot.properties file in order to get the credentials to startup the managed server. However, it appears it needs to contact the WebLogic Domain in order to properly encrypt the boot.properties. You would also see the failed connection to the WebLogic Domain in the logs:

<Aug 2, 2014 6:52:04 AM UTC> <Emergency> <Management> <BEA-141151> <The admin server could not be reached at http://wldomain:7001.>
<Aug 2, 2014 6:52:04 AM UTC> <Info> <Configuration Management> <BEA-150018> <This server is being started in managed server independence mode in the absence of the admin server.>

Having a look at the boot.properties file it looks like the password has not been encrypted, hence the EncryptionServiceException. It is trying to read what it thinks is the encrypted password but the password is plain text, causing the exception. 

C:\Oracle\Middleware\user_projects\domains\EPMSystem\servers\RaFramework0\security\boot.properties
#
#Fri Aug 01 05:06:31 UTC 2014
password=epm_admin
username=mypassword


So the question is how to get the Managed Server to properly contact the WebLogic domain to encrypt the boot.properties file. 

I usually do this by starting it from the command line, 

cd C:\Oracle\Middleware\user_projects\domains\EPMSystem\bin
startManagedWeblogic.cmd RaFramework1 http://wldomain:7001

Note, the second argument is the WebLogic Domain URL. 

While starting up, assuming the Managed Server is able to connect to the domain you will see:

<Aug 2, 2014 6:57:37 AM UTC> <Notice> <Security> <BEA-090083> <Storing boot identity in the file: C:\Oracle\Middleware\user_projects\domains\EPMSystem\servers\RaFramework1\security\boot.properties>

This is a good indication that the problem has been resolved. Looking in the boot.properties you should see the encrypted password.

This approach is what I find best for debugging. Additionally, the startup parameters for starting the WebLogic managed server as a Windows service are in the Windows Registry. Specifically for Hyperion there is a JVMOption for setting the WL Domain server. Make sure that this is set correctly when starting as a Windows service. "-Dweblogic.management.server=http://wldomain:7001"

Monday, July 28, 2014

Some Bad Patches?


I recently came across a few bad patches.

Specifically, HSS 17537889, 11.1.2.2.303

In Shared Services, Application Groups, when trying to expand LCM for HFM applications (also with EPMA applications) SSL URLs will be appended with a :80, invalidating the URL.

EPMLCM-30034:Failed in communicating to Hyperion Financial Management remote service at URL https://servername:80/hfmlcmservice/LCMWS.asmx

Note the https protocol with the :80 appended to the end. The patch cannot form SSL URL's properly. This was confirmed by rolling back the patch and seeing the URLs form properly.


Additionally, HFM 11.1.2.2.306 & 307 cause certain Financial Reports to Fail against HFM. The error in HFM is:
File: CDQISource.cpp Version: 11.1.2.2.306.4365 Line: 6408 Error: (-3)(0xFFFFFFFD)(An unknown error occurred. (-3))

Specifically, looking at the .307 Readme, under known issues:
· 18528346 – Financial reports using ‘Base of Current POV for entity’ is not working with conditional suppression.
I think there are broader issues with FR and HFM in this patch that only occur with certain report options in FR.
I suspect these two issues also reside in the 11.1.2.2.500 patch set because it was released before Oracle fixed these issues.

Sunday, June 15, 2014

Fun with the Hyperion Log Analysis Utility


 I was recently playing around with the Log Analysis tool on 11.1.2.2.

John Goodwin has a good write up of the basic operation of the log analysis utility here:
http://john-goodwin.blogspot.com/2013/09/epm-standalone-log-analysis-utility.html

After reviewing the tool, I was curious how to make it a bit more useful.

First, John seems to be pinpointing specific product log directories with the "-d" option. I am curious if it is possible to get more generic and just have the utility tell us what is wrong without specifying the product. This would be beneficial if an error is occurring in the system and you just want a quick report of what error messages might be coming out to help troubleshoot.

I tried very generically:
loganalysis.bat -system -tmin 60  -d d:\Oracle\Middleware\user_projects

However, this seemed to only get the logs in user_projects\epmsystem1\diagnostics. It was missing the Weblogic logs under user_projects\domains. These logs are important because they provide insight into what is happening at the web application server layer. To get more of a complete picture of the system, it is possible to generate a report for both the domains folder and the diagnostics folder. 
loganalysis.bat -system -tmin 60  -d d:\Oracle\Middleware\user_projects\domains -o domains_report
loganalysis.bat -system -tmin 60  -d d:\Oracle\Middleware\\user_projects\epmsystem1\diagnostics -o diagnostics_report


Now this is on the right track. The next issue is that the utility is only scanning a single node. In a distributed environment, this doesn't help the problem of having to log into multiple nodes to gather the logs. Digging a little deeper, it looks like UNC paths work well in the Log Analysis utility.
For instance,
loganalysis.bat -system -tmin 60 -d \\node1\d$\Oracle\Middleware\user_projects\epmsystem1\diagnostics -o node1_diagnostics

Using this information, it is possible to orchestrate reports for all nodes in the environment at the same time from a single node. When each report finishes it opens a html report as a tab in Internet Explorer for review. When completed the results are easily viewed by scanning through the multiple tabs in IE.

The final script to run reports for all nodes at once might look something like this:

set SEARCH_IN_MINS=60

start cmd /C "loganalysis.bat -system -tmin %SEARCH_IN_MINS% -d \\node1\d$\Oracle\Middleware\user_projects\domains -o web1_domains"
start cmd /C "loganalysis.bat -system -tmin %SEARCH_IN_MINS% -d \\node1\d$\Oracle\Middleware\user_projects\epmsystem1\diagnostics -o web1_diagnostics"

start cmd /C "loganalysis.bat -system -tmin %SEARCH_IN_MINS%  -d \\node2\d$\Oracle\Middleware\user_projects\domains -o web2_domains"
start cmd /C "loganalysis.bat -system -tmin %SEARCH_IN_MINS%  -d \\node2\d$\Oracle\Middleware\user_projects\epmsystem1\diagnostics -o web2_diagnostics"

start cmd /C "loganalysis.bat -system -tmin %SEARCH_IN_MINS%  -d \\node3\d$\Oracle\Middleware\user_projects\epmsystem1\diagnostics -o planning1_diagnostics"
start cmd /C "loganalysis.bat -system -tmin %SEARCH_IN_MINS%  -d \\node3\d$\Oracle\Middleware\user_projects\domains -o planning1_domains"

start cmd /C "loganalysis.bat -system -tmin %SEARCH_IN_MINS%  -d \\node4\d$\Oracle\Middleware\user_projects\epmsystem1\diagnostics -o planning2_diagnostics"
start cmd /C "loganalysis.bat -system -tmin %SEARCH_IN_MINS%  -d \\node4\d$\Oracle\Middleware\user_projects\domains -o planning2_domains"

....

I think this is infinity more useful. Much more useful than the single node, single product example. Next time Hyperion is down and everyone is scrambling looking for answers, this script might just be a quick way to discover the issue.

Friday, May 30, 2014

Testing the FDM 32-Bit Database Connection

I recently had some issues with FDM and suspected an issue with the database client setup. However, both the 32 bit and 64 bit Oracle clients were installed on the machine to support both FDM (32 bit) and HFM (64 bit). 

Specifically, I wanted to test the Windows OLE DB Oracle Driver connection, rather than relying on sqlplus to ensure the OLE DB connection was working with the Oracle client. The 11.2.0.3 Oracle client installer has a known bug where it fails to correctly register OLE DB. However, if I was to just create a simple test .UDL file using the Oracle OLE DB provider it would default to using the 64 bit Oracle client. How could I test the 32 bit client using a UDL file when both the 32 and 64 bit Oracle client is installed?

This blog article was very helpful:
http://blogs.msdn.com/b/chaitanya_medikonduri/archive/2008/04/09/how-to-run-32-bit-udl-file-on-a-64-bit-operating-system.aspx

Ultimately, this command was able to successfully test the 32 bit Oracle Client using Oracle's OLE DB provider: 
C:\Windows\system32>C:\WINDOWS\SysWOW64\Rundll32.exe "C:\Program Files (x86)\Common Files\System\Ole DB\oledb32.dll",OpenDSLFile c:\Windows\Temp\test.udl32

This post is a tad bit "in the weeds". However, FDM provides very little indication of what is actually going on when it fails to open. This might just be useful to verify all aspects of connectivity when troubleshooting.

Saturday, April 26, 2014

Automating the Daunting Task of Hyperion Database Password Changes


Disclaimer: This is information is for advanced level administrators. Please use at your own risk. 

This post is about changing the underlying database passwords that form the foundation of the Hyperion EPM environment. In Hyperion nearly every product has a corresponding database schema/user, depending on how the environment is setup. In my experience clients infrequently change these database passwords, if ever. Perhaps this is just lax security or maybe it is due to the daunting efforts involved in password changes. I'd like to show a different approach which might help make the process easier.

First, let's look at the typical process for changing database passwords.
  1. Change the HSS/Registry password:
    On EACH node in your environment:
    Open config tool on each node and given that the connection to HSS now fails, it will prompt you to re-enter the database credential to HSS registry from scratch. You will then need to run the "Foundation -> Configure database" configure task.
  2. For each EPM module with a database connection
    For instance: CalcMgr, Planning, HFM..etc
    Run the config tool and run the database configuration task for each module. Note: when updating it will prompt you to also redeploy the application server to update the WebLogic passwords. You would need to enter "perform first time configuration of database", make certain the schema in the drop down list is pointing at the correct DB for this module, and then it will prompt "would you like to reuse or drop". You need to correctly select "reuse" or risk erasing all your data.
  3. Update product specific passwords. There are some passwords which do not fit the mold described in 2. These are things like .UDL files for HFM, FDM Applications, Planning Applications. Each of these components will need evaluated and handled differently. 
The tasks in 1 and 2 can be readily automated.  In my opinion the steps in 1 and 2 are very time consuming and potentially risk prone if you make a mistake. I really like to avoid using config tool and redeploying as much as possible due to its buggy behavior. Let's re-evaluate what happens in step 1 and 2 and try to automate. The third step is harder since it depends based on the application, and perhaps not worth trying to automate.
  1. Changing the HSS Registry password consists of two parts. There is a reg.properties file which simply contains the JDBC connection details and is used when opening config tool. If config tool cannot connect to the HSS database, it will prompt to re-enter the DB info for the purpose of recreating the reg.properties file.
    The second part is updating the HSS registry with the correct DB HSS/Registry password.

    Consequently, the password is stored in the file system and in the database.

    Automation:
    The encrypted value for reg.properties is not possible to create outside of using config tool. So the first node you will need to open config tool and re-enter the database credentials, and run the "Foundation -> configure database" task. This creates a new reg.properties on this node and also updates the HSS password in the database (HSS registry). For automation, do not re-run this on each node, just copy the updated reg.properties to each node:
    copy d:\Oracle\Middleware\user_projects\epmsystem1\config\foundation\11.1.2.0\reg.properties \\server2\d$\Oracle\Middleware\user_projects\epmsystem1\config\foundation\11.1.2.0\reg.properties copy d:\Oracle\Middleware\user_projects\epmsystem1\config\foundation\11.1.2.0\reg.properties \\server3\d$\Oracle\Middleware\user_projects\epmsystem1\config\foundation\11.1.2.0\reg.properties ...
  2. Step two is updating the passwords for each EPM module. Behind the scenes this step has two parts. a) Updates EPM module password in the EPM System Registry. b) Redeploying the Application Server updates the JDBC connections in WebLogic Essentially, Oracle\Middleware\user_projects\domains\EPMSystem\config\jdbc Automation for a) The epmsys_registry tool can be used to change the HSS Registry connections. You can find a list of all the database connections stored in the registry by running epmsys_registry without any arguments to generate a registry report. Then search for DATABASE_CONN to identify the database connections. To change from the command line you can use this syntax:
    set SYSREG=D:\Oracle\Middleware\user_projects\epmsystem1\bin\epmsys_registry

    call %SYSREG% updateencryptedproperty SYSTEM9/FOUNDATION_SERVICES_PRODUCT/EPMA_PRODUCT/DATABASE_CONN/@dbPassword NewPassword
    ... Continue for each database connection. Remember, you can skip the EPMRegistry connection since you already updated this in step 1. Automation for b) WebLogic has a scripting language, WLST, which can be used to script the password change in WebLogic. To find the list of datasources which need modfied login to the WebLogic Admin console and look under the JDBC connections. Note: the list of connections does not necessarily match those in part a, above.
    WLST Script

    connect('epm_admin','','t3://wl_domain:7001')
    edit()
    startEdit()
    pass_aif = encrypt('NewPassword')

    dsName = 'aif_datasource'
    cd('/JDBCSystemResources/'+dsName+'/JDBCResource/'+dsName+'/JDBCDriverParams/'+dsName)
    set('PasswordEncrypted', pass_aif)

    save()
    activate()
Conclusion: 
The steps shown here can be prepared and scripted before the maintenance. This dramatically eliminates the time it takes to perform the maintenance since the time consuming steps are prepared up front and not being performed on demand when the environment is already down.

Wednesday, October 23, 2013

RA Through Firewall

Another quick tip for implenting RA through a firewall.

The RA service by default uses a dynamic RMI port. To pin this port down for allowing through a firewall you can find the following documented:

http://docs.oracle.com/cd/E17236_01/epm.1112/epm_install_start_here_11121/frameset.htm?ch07s06s02.html

Remote ADM ServerDynamicEPM_ORACLE_HOME/common/ADM/11.1.2.0/lib/ADM.properties

Uncomment ADM_RMI_SERVER_PORT and specify value for the port opened in firewall.

Workspace (RA) Hangs when Implementing Firewall


Recently, I was implementing RA through a firewall. I discovered that Workspace would hang during login.

Looking through the logs I noticed some stuck threads in the RAFrameworkWeb logs,
which is more than the configured time (StuckThreadMaxTime) of "600" seconds. Stack trace:Thread-204 "[STUCK] ExecuteThread: '12' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, in native, suspended, priority=1, DAEMON> {    java.net.PlainSocketImpl.socketConnect(PlainSocketImpl.java:???)....java.lang.Class@10a184ba[thin lock]    com.sqribe.WS.WebSQRIBE.loginToRepository(Unknown Source)
This means something is unable to connect during RA initialization. 

Checking the Windows firewall for dropped packets (after configuring logging of dropped packets)

2013-10-10 23:15:43 DROP TCP xx.xx.xx.xx xx.xx.xx.xx 5613 32631 48 S 3865585991 0 8192 - - - RECEIVE


Apparently, some higher level ports are being blocked... What is process is being blocked?
I looked further at the netstat output to see what ports were LISTENING on the node, 

Netstat -a -n -o 
  TCP    0.0.0.0:6801           0.0.0.0:0              LISTENING       9256
  TCP    0.0.0.0:6802           0.0.0.0:0              LISTENING       9256
  TCP    0.0.0.0:6803           0.0.0.0:0              LISTENING       9256
  TCP    0.0.0.0:6804           0.0.0.0:0              LISTENING       9256
  TCP    0.0.0.0:6805           0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30298   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30332   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30365   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30400   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30437   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30472   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30507   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30540   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30575   0.0.0.0:0              LISTENING       9256
  TCP    xx.xx.xx.xx:30611   0.0.0.0:0              LISTENING       9256

  TCP    xx.xx.xx.xx:30646   0.0.0.0:0              LISTENING       9256

There are a ton of 30k+ ports that that RA is listening on. Obviously, these are not being allowed through the firewall. The 9256 column is the PID, but also the common 6800 , 6801 ports identify this as the RA Framework Services. What is causing these higher number ports? 

Finally, after some digging in the KB,
How to Change BI+ EPM Workspace to No Longer Use Dynamic Port Allocation for Logins to Properly Configure the Firewall to Allow Connections. (Doc ID 1483777.1)

Hyperion BI+ - Version 11.1.2.0.00 to 11.1.2.2.000 [Release 11.1]

How to change BI+ EPM Workspace to no longer use dynamic port allocation for logins so you can properly configure the firewall to allow connections.

FIX

 1. Login to Workspace click  Navigate -> Administer -> R&A -> Web Applications
      Right click on Reporting and Analysis Framework Web-Application -> Select Properties -> Internal
        Change Pass Data Using Streams Instead Of Files to No -> click OK

 2. Restart RaFramework Web


Also, there is a small blurb in the EPM doc about this if you look carefully,