Why... Why... Why?
This blog is dedicated to documenting error resolution and other tidbits that I discover while working as a Consultant in the Oracle EPM (Hyperion) field. As much of my job revolves around issue resolution, I see an opportunity to supplement the typical troubleshooting avenues such as the Oracle Knowledgebase and Oracle Forums with more pinpointed information about specific errors as they are encountered. Beware, the information found in this blog is for informational purposes only and comes without any warranty or guarantee of accuracy.

Tuesday, May 12, 2015

Basic WebLogic Tuning in EPM

Part of the EPM Infrastucture Tuning Guide is a section on WebLogic tuning. The guide can be found here, https://blogs.oracle.com/pa/resource/Oracle_EPM_11_1_2_3_Tuning_Guide_v2.pdf 

These steps are recommended, and will be required because any time you create an SR with Oracle they will ask, "Did you apply the recomended tuning guidelines?" Responding no to this question usually results in an impass and failure to complete the SR.

The basic WebLogic Tuning steps are:
1. Increase number of threads in connection pool
2. Tune connection backlog buffering
3. Stuck thread detection behavior tuning
4. Enable Native IO Performance pack.

Analyizing this a bit, steps 2 and 4 are already configured by default in 10.3.x. Consequently, they can be ignored.

The tuning guide shows how to somewhat teadilusly go and set the database connection pool settings and stuck threads using some screenshots. However, a scripted approach might be better to apply the settings in multiple enviroments, and ensure unifomity over time. A quick WLST script is below to do this. Keep in mind the tuning settings are just suggested values. For instance, stuck thread time is more an indicator of how long your longest acceptable web transaction might be, which is customer specific. In the script below EAS is configured with a different thread value than the rest of the managed servers, as an example.

Important Note: Once the connection pool is increased, you may also need to increase the number of available connections the database allows. For example, you may want to increase the number of processes in the Oracle Database.

File: TuneWL.py
connect(.....) 
edit() 
startEdit()

servers = cmo.getServers()
for server in servers:
  name=server.getName()
  print 'Tuning Cluster: '+ name
  cd('/Servers/' + name)   
  t_time = 1200
  if "Essbase" in name:
    t_time = 2400
    print "Setting EAS thread time to " + str(t_time)
  cmo.setStuckThreadMaxTime(t_time)
  cmo.setStuckThreadTimerInterval(t_time)


# JDBC connection pool

dsName = 'calc_datasource'cd('/JDBCSystemResources/'+dsName+'/JDBCResource/'+dsName+'/JDBCConnectionPoolParams/'+dsName)cmo.setMaxCapacity(30)

dsName = 'eas_datasource'cd('/JDBCSystemResources/'+dsName+'/JDBCResource/'+dsName+'/JDBCConnectionPoolParams/'+dsName)cmo.setMaxCapacity(30)

dsName = 'EPMSystemRegistry'cd('/JDBCSystemResources/'+dsName+'/JDBCResource/'+dsName+'/JDBCConnectionPoolParams/'+dsName)cmo.setMaxCapacity(150)

dsName = 'planning_datasource'cd('/JDBCSystemResources/'+dsName+'/JDBCResource/'+dsName+'/JDBCConnectionPoolParams/'+dsName)cmo.setMaxCapacity(150)

dsName = 'raframework_datasource'cd('/JDBCSystemResources/'+dsName+'/JDBCResource/'+dsName+'/JDBCConnectionPoolParams/'+dsName)cmo.setMaxCapacity(150)

save() 
activate() 
Output:

C:\Oracle\Middleware\user_projects\domains\EPMSystem\bin>setDomainEnv.cmd
C:\Oracle\Middleware\user_projects\domains\EPMSystem>java weblogic.WLST TuneWL.py


Starting an edit session ...
Started edit session, please be sure to save and actichanges once you are done.

Tuning Cluster: FoundationServices0
Tuning Cluster: EssbaseAdminServices0
Setting EAS thread time to 2400
Tuning Cluster: EpmaDataSync0

...

Wednesday, May 6, 2015

Windows service restart in action...

A while back I wrote a post on automatic service restarts using Windows:
http://epm-errors.blogspot.com/2015/04/auto-restarting-windows-services.html

I recently had the opportunity to watch one of the Windows automatic service restarts happen in a real production scenario. Complete with pagers firing, alerts going off, and, at the end of it, a completetly automatic service restart. This type of event is somewhat hard to trigger for testing, so it was nice to see it work as expected.

The event looked like this in Windows Event Viewer:

By the time I got logged in and was looking into the log file I noticed WebLogic starting back up. Ultimatly, no action was required. I never did find any error in the logs. However, there were some odd access log entries relating to a security scan and I think this ultimatly caused EAS to crash.

Thursday, April 16, 2015

Linux Tips - Beyond the Basics (Part 2)

Find log files in / last modified within 1 day
This is great when you are on an unfamiliar system or don’t know which log file you need to look at.
find / -name "*.log" -mtime -1


Interactively look at recent logs
Each log file modified within 1 day is constantly streamed. Simply retry the operation and watch the new log messages appear on the screen.
find / -name "*.log" -mtime -1 | xargs tail -f


Parsing Command Output
It is good to be able to quickly parse output from the command line. Fortunately with awk and grep, many operations are possible with basic knowledge.


Hypothetically, let’s say you want to find the number of network packets on the eth1 interface.
There are some easy commands to get this other commands to get this information, but, for the sake of learning, this example just uses basic parsing.


First the ifconfig command tells us this info on each interface, including the number of received packets, RX packets:


eth1      Link encap:Ethernet  HWaddr 08:00:27:B0:6A:94
         inet addr:192.168.56.101  Bcast:192.168.56.255  Mask:255.255.255.0
         inet6 addr: fe80::a00:27ff:feb0:6a94/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:1522 errors:0 dropped:0 overruns:0 frame:0
         TX packets:1461 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:133938 (130.7 KiB)  TX bytes:339721 (331.7 KiB)


lo        Link encap:Local Loopback
         inet addr:127.0.0.1  Mask:255.0.0.0
         inet6 addr: ::1/128 Scope:Host
         UP LOOPBACK RUNNING  MTU:65536  Metric:1
         RX packets:0 errors:0 dropped:0 overruns:0 frame:0
         TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

But we want to only focus on the RX packets. Let's first eliminate the loopback interface. You could easily run ifconfig eth1, but it is also possible to grep for the data.


The following will focus on the "eth1" line and also display 5 lines after the eth1 match to get the RX packets line:


ifconfig | grep eth1 -A5


eth1      Link encap:Ethernet  HWaddr 08:00:27:B0:6A:94
         inet addr:192.168.56.101  Bcast:192.168.56.255  Mask:255.255.255.0
         inet6 addr: fe80::a00:27ff:feb0:6a94/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:1556 errors:0 dropped:0 overruns:0 frame:0
         TX packets:1502 errors:0 dropped:0 overruns:0 carrier:0


Next, let’s keep trying to focus on the RX packets:
ifconfig | grep eth1 -A5 | grep RX  
[root@localhost tmp]# ifconfig | grep eth1 -A5 | grep RX
         RX packets:1572 errors:0 dropped:0 overruns:0 frame:0
Now it is down to parsing this line for the number of packets, say 1572...


The awk command can be used to split a line into columns where spaces separate the fields.
Therefore, if each space in the line above represents a column we can focus on column two:


ifconfig | grep eth1 -A5 | grep RX | awk ' {  print $2 } '
packets:1572


Great, now how can we strip out the “packets:” from the raw number we are looking for?
We already know that awk is good for selecting columns from text data. As the string is now, it can also be interpreted in column format by switching the delimiter from a space to colon. Therefore, selecting column 2 using a ":" delimiter with awk will finally give us the number
[root@localhost tmp]# ifconfig | grep eth1 -A5 | grep RX | awk ' {  print $2 } ' | awk -F ":" ' { print $2 } '
1616


That's it!


What can be done with the number of packets? If you take a sample before after, say for 60 seconds one can derive a rough estimate of packets per second on the interface.


LENGTH=60
START=`ifconfig | grep eth1 -A5 | grep RX | awk ' {  print $2 } ' | awk -F ":" ' { print $2 } '`;
sleep $LENGTH
END=`ifconfig | grep eth1 -A5 | grep RX | awk ' {  print $2 } ' | awk -F ":" ' { print $2 } '`;
echo $START $END $LENGTH | awk ' { print ($2-$1)/$3 } '


Again the awk command in the last line is used to parse columns but additionally it can be used for doing simple arithmetic with the columns. For instance, packets/sec = ($END-$START)/$LENGTH


The thing to remember about this example is that this is not difficult. The solution only uses simple features of awk and grep.


This exercise can also be done without using ifconfig. As an exercise on your own try using
only the file /proc/net/dev and grep/awk commands. The same metric, RX packets, can be found here:
Inter-|   Receive                                                |  Transmit
face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
 eth1:  359967    4110    0    0    0     0          0         1   786631    3985    0    0    0     0       0          0
 eth2:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
   lo:       0    



Thursday, April 9, 2015

Linux Tips - Beyond the Basics (Part 1)

As a lot of recent posts have focused on Windows tips, a coworker of mine suggested writing up some Unix tips. Many day to day operations in Unix can be performed using the same old techniques and a lot of Unix users become stagnant and do not progress to help improve Unix skills beyond the basics. Here are everyday tips I still find useful from my days as a Unix admin.

There is a lot that can be covered, at this point it looks like this will be a multi-part series of articles. Feel free to comment with any suggested topics.

Simple Bash Navigation
There is no excuse for fumbling around on the command line. This is your primary interface to Unix and Hyperion. However, this all breaks down when your boss is looking over your shoulder and you start fumbling around like a novice.

(emacs key bindings)
CTRL +a beginning of line,
CTRL+ e end of line
CTRL + k - cut from current position to end of the line
CTRL + y - yank back the cut contents at the current cursor position
Learn other cool tips like swapping the position of command line arguments:
http://ss64.com/bash/syntax-keyboard.html

Ctrl+r at the Bash Shell
This is the recall command; the most useful feature in the bash shell. Maneuvering in the Unix shell often requires complex commands and long paths. Most users are familiar with the history command and the up and down arrows to access recent history. However, this is inefficient. At the command prompt press CTRL+r and type any unique part of the string containing the command that you want to recall. This powerful feature lets you pull up anything from the history by typing in a few unique characters. You may have used this before. Web browsers have copied this feature in the URL prompt. For instance, typing /em in Chrome can pull back "https://server:7002/em". After some time using this feature your brain starts to rewire and it becomes a snap to use "recall" mode to rapidly pull up commands to run or edit.

If there is more than one match, pressing CTRL+r again will go to a prior match the command history. CTRL + p and CTRL + r also cycle back and forth between matches.

Advanced users will find that the more commands you type in with full paths, the more the recall feature helps. For instance, if you run a command by changing into 6 directories with individual cd commands and then run the command with a relative path, the recall feature won't help piece this together. It is encouraged to enter the full paths to the command so it is easily run from any location at a different time.

Become Very Familiar with your Editor of Choice
The text editor is the most powerful tool on the Linux box. Vi or EMACS or Other? Either way - learn more than the basics, these are powerful editors can can save tons of time.

Emacs:
https://www.gnu.org/software/emacs/refcards/pdf/refcard.pdf

Vi:
http://www.albany.edu/faculty/hy973732/ist535/vi_editor_commands.pdf

Screen
The Unix command prompt is a fragile thing when running over a network connection via SSH. Ever run a command, have it take longer than expected and now it's time to go home? Logging out will surely kill the long running process. This is very useful when running database backups that will take a while, yet the default timeout for your bash shell is only 20 minutes. This prevents the backup terminating when your shell is killed.


Screen basics:
Before running long running command type "screen" to enter screen session.

screen -S ScreenTest

sleep 1000000

Now detach your session
CTRL+a d

[detached from 11504.ScreenTest]

Log out and go home for the day.

When you get home, check what's running:
:~$ screen -ls
There are screens on:
        11504.ScreenTest        (04/08/2015 08:44:26 PM)        (Detached)
1 Socket in /var/run/screen

reattach to the running screen session:
screen -r ScreenTest
sleep 1000000 (sleep command is still sitting there executing)

Killing
kill all processes matching the word "java"

killall java
OR
ps -wwef | grep java | awk ' { print $2 } ' | xargs kill -9

Swiftly kill everything. If you have a service account such as "hyperion" and you need to quickly make sure all processes are down, assuming you have tried all sane ways to do this, you can kill everything running under the current user id by:
kill -9 -1
Tip: Don't do this as root.

Network Testing
The old "telnet" command to test if a service is running on a port is being deprecated. The nc command, "netcat" command can be used for basic connectivity.

:~$ nc -v localhost 80
nc: connect to localhost port 80 (tcp) failed: Connection refused

Also nc can be used to startup a mock network server on a port for testing connectivity between two machines. This allows you to test firewall rules long before you have setup the environment and realize the connectivity is not working.

On server1:
nc -l 1234    (start a server on port 1234)

from another machine, try to connect:
nc -v server1 1234
typing text on the client side should appear on the server side.

Copying Files Between Servers

rsync -a ~/tmp/dir1/ oracle@epmvirt:/tmp/destination

Bash For Loops
for i in a b c; do  echo $i ; done

for server in server1 server2 server3 server4 ; do scp file $server:/tmp/; done

Disk Space Consumption
du -m /u0 | sort -n

36643   /u0/app/oracle/product
68393   /u0/app/oracle/backups
320977  /u0/app/oracle/arch
576131  /u0/app/oracle
576230  /u0/app
605054  /u0/

This shows where to find the largest disk space consumers to quickly mitigate disk space issues. In this case we could check the archive logs directory and reduce disk space.

Learning a Programming Language
Often users get sucked into writing complex shell scripts. They start off innocent enough, but quickly shell scripting will show its weaknesses. Shell scripts become unwieldy with syntax and flow control - loops all over the place, difficult command line argument handling, poor error handling... etc. I encourage having a more elegant language to write scripts in. Any will do, but python can be a quick way to get started and is proven in the real world. Even if you completely new, spending a few hours on a tutorial site and then tackling a simple program in python will pay off in the long run over using a bash script.
A tutorial can be found here:
https://docs.python.org/2/tutorial/

Monday, April 6, 2015

Auto Restarting Windows Services

I wanted to discuss a very simple task to help with keeping services running. In most cases if you have some service running and it unexpectedly crashes, you want it to automatically come back up. This holds true whether you are an EPM pro or just starting out.

In an earlier post, I talked about a more complex event like running an external script upon failure.
http://epm-errors.blogspot.com/2014/12/how-to-triage-hyperion-windows-service.html
The external script solution can be good for very specific cases, but it takes a bit of work to setup. To expand upon this specific case, it is a good idea to have every Hyperion service to auto restart upon failure. This is much simpler than setting up custom scripts. It is a simple property in the Windows service settings.

The settings are located in Windows Services, under the service properties there is a Recovery tab. Here you can set the action to "Restart the Service"



An easy way to script this is to get the Service names for all Hyperion components and write up something like this:
sc \\localhost failure HyS9FRReports actions= restart/60000/restart/60000/""/60000 reset= 0
Localhost can be replaced by the server name.

Some additional tips:
  • Don't infinity restart the service by setting the "subsequent failures" box. At some point you'll want to have it crash and stay down rather than flap up and down constantly.
  • This restart action is not intended to solve the root cause of problems. You should be detecting when the services crash to take additional action on the root cause.
  • This customization is overwritten whenever you run the Hyperion Config Tool. Config Tool resets the service properties and the custom settings would need reapplied. 

Sunday, March 29, 2015

Handling WebLogic Failures


The default behavior when a critical error occurs in a running WebLogic application is to do nothing about the failure condition. This can cause the web server, OHS, to keep routing traffic to the service, even in a highly available situation where there is more than one server available. This situation leaves many with HA setups vulnerable to a single point of failure.

The basic problem is that when you have a WebLogic service running, such as HFM, and it encounters a critical situation such as out of memory (OOM) error, the service comes to a halt and cannot process users requests. End users are left with their requests spinning in the web browser until they eventually timeout. It is important to configure the WebLogic Managed Server to acknowledge the failure and shut itself down quickly so that the other member of the cluster can take over.

Two settings in WebLogic can help with this, under the theme Overload protection. The settings are Failure Action and Panic Action.

  

This can be scripted to set for all managed servers using WLST
connect(....)  
edit() 
startEdit()
servers = cmo.getServers()
for server in servers:
  name=server.getName()
               cd('OverloadProtection/' + name)
      cmo.setPanicAction('system-exit')               
      cmo.setFailureAction('force-shutdown')
    save()
    activate()



More Details about overload protection can be found here:
http://www.dba-oracle.com/t_weblogic_overload_protection.htm

Taking this scenario one step further, rather than just failing, Node Manager can be used automatically restart the failed service. However, Node Manager is most often used as an optional component, so you would need to decide whether to implement Node Manager to use these features.




Wednesday, March 25, 2015

Useful Administrative Tools in Windows

Do you work with Hyperion in Windows and feel like your hands are tied? Like a fish out of water? OK, maybe this is an exaggeration for most admins, but as a former Linux sysadmin it is how I feel most days. I wanted to share some tools that can be used to help bridge the gap and become a Windows power user as it relates to Hyperion. Please feel free to share your own tools and ideas in the comments.

Note: Each of these tools has its own licencing agreement. Please carefully read and adhere to each licence. 

AstroGrep

Unix equivalent: grep
Find strings in files. AstroGrep is much more powerful than windows find utility. The Windows find utility only searches for whole words. For example, "this is an exception" vs. "java.lang.exception". The Windows find utility only finds the first example, yet AstroGrep finds both strings.

If Hyperion throws a weird message or error, you can use AstroGrep to search the file system for the file to help pinpoint where it is coming from and get more context around the error. It can be useful to find specific server names, passwords, usernames, etc in files such as MAXL scripts.

In the example below I am searching the *.log files in the EPM Diagnostics folder for the string "Exception".

WinDiff

Unix equivalent: diff

Diff files, or recursively entire directories for changes. For instance, compare the EPMA accounts from an LCM export taken at two different times to identify the differences in EPMA metadata.

mTAIL

Unix equivalent: tail -f
Windows error log tailer. It is used to monitor logs in real time. Simply drag and drop a file from windows explorer into mTAIL, click the start button and it will start displaying updates to the log file in real time. This is handy when you need to check a file over and over for updates, such as looking for new errors. It is also useful if you are constantly restarting WebLogic to check log for errors during startup.



Process Explorer and Handle

Unix equivalent: lsof
Ever try to rename or delete that certain folder and it keeps saying it is in use, but you do not see anything open? Ever have trouble with Hyperion patches failing to apply because some files in use? Obviously, the first step is to search task manager for Hyperion related processes. However, there are some pesky situations where this alone cannot identify the culprit. Process Explorer has an option to search file handles, and is easier to use. The Handle tool is a command line tool and kind of archaic, but can also find what is using the files you want to get access to.

WinDirStat

Unix equivalent: du -k | sort -n
Out of disk space? Need a quick tool to discover the where the most space is being consumed? WinDirStat gives you an ordered list of disk space consumption and lets you drill through, finding what is using all the disk space on your system.


mRemoteNG

  Sadly, the opensource form of this tool is deprecated. It it a very useful remote desktop/SSH session manager for Windows. Integrates nicely with PuTTY.

Process Monitor

Unix equivalent: strace
A very low level tool to see the system calls a process is making while executing. Typically this tool comes out when all of the normal avenues have been tried to solve the problem. Sometimes getting down to the level of system calls can pinpoint things like missing files, bad security, and bugs. This helps figure out why a process is crashing by seeing what unexpected conditions the process encountered while running.

Internet Explorer Developer tools

Similar tools also available in Chrome and Firefox

Network profiling can help find latency in your requests. For instance, you can measure the time it takes to get resources from different geographic regions and pinpoint certain latency issues. From the example below, there are a ton of images and background activity after each click in Hyperion. Sometimes slow performance can be found on the front-end using this method. It is also useful to inspect in detail certain processes for debugging purposes.

Simply go to Tools -> F12 Developer Tools. Select the Network tab, and "start capturing".