Why... Why... Why?
This blog is dedicated to documenting error resolution and other tidbits that I discover while working as a Consultant in the Oracle EPM (Hyperion) field. As much of my job revolves around issue resolution, I see an opportunity to supplement the typical troubleshooting avenues such as the Oracle Knowledgebase and Oracle Forums with more pinpointed information about specific errors as they are encountered. Beware, the information found in this blog is for informational purposes only and comes without any warranty or guarantee of accuracy.

Sunday, June 21, 2015

Hyperion Auditing - In Practice vs. Theory

At first glance the Shared Services audit features in Hyperion seem complete. However, in practice, the user interface seems to be clunky and hard to use.

The audit features are found in Shared Services. First, auditing has to be enabled in order to capture any auditing information. This can be accomplished by entering Shared Services and selecting Administration -> configure auditing. Select "Enable Auditing". Most of the auditing for the various products are simply LCM. In other words, you will see only LCM operations being audited. However, under Shared Services there is a little more detail:

The option that is most interesting is User Provisioning. Many organizations require fine grained auditing of users as they are provisioned and deprovisioned from applications. This can aid in meeting specific SOX requirements. Let's take a closer look at this in practice.

First, let's look at the Shared Services Auditing user interface. This is found in Administration -> Audit Reports -> Security Reports.

The default report shows 30 days of history. It can be confined to a specific date range too. You can search by "performed by". However, if you are unsure who performed the action, this field may not be very useful. Finally, you can narrow down by product name. In this case the product is Shared Services for checking provisioning information.



One of the first challenges becomes sifting through the interface. It is impossible to sort or narrow
down by Task, which is the operation being audited. Consequently, the display is overwhelmed with "authenticate" requests from users using the system. Secondly, only 50 items can be displayed on the page at a time. Let's say you want to review audit information over a range of 10 days. Now the pagination ("1of x") comes into play. One has to sort though multiple pages of info, most of which is irrelevant authentication information. What about finding when a user was provisioned over the last year? Forget about it.

Assuming you do find a particular item of interest. In this case, the admin user provisioned "testuser1" to an application. Notice anything missing? What application? For this information you have to click the checkmark in the options for "Detailed View".


The detailed view shows the full detail. This now tells us that testuser1 was provisioned by admin to the PLANDEMO application as role Administrator.


This display is perfectly fine, but it is difficult to search around repeatedly for looking for specific information.

One way to get at the data is to tear off the clunky user interface and head to the database, targeting exactly the info you want. A rough, basic query can be put together quickly. From there it is possible to do much more powerful querying. Additionally, the audit information accumulates very quickly. Keeping this data around can grow to huge sums of data. Using a query over the user interface can help sift through millions of rows quickly.

select STARTTIME, USER_NAME, ARTIFACT_NAME, ATTRIBUTE_NAME, attribute_curr_value from 
  SMA_AUDIT_FACT NATURAL JOIN SMA_TASK_DIM
  NATURAL JOIN SMA_AUDIT_ATTRIBUTE_FACT 

 where artifact_name like '%testuser1%' and 
 TASK_NAME like '%Provision User%'
 ORDER BY STARTTIME, AUDIT_FACT_ID, ARTIFACT_NAME;

Results of query:

Again we find that testuser1 was provisioned by admin to the PLANDEMO application as role Administrator.

Friday, June 5, 2015

Silent Install and Configuration for EPM

One of the keys to reproducible installation and configuration is using a response file to store the configuration. The response file allows for rapid, scripted deployment across multiple environments. It also helps ensure others can easily repeat the installation.

The basis for EPMVirt is using response files for install/configuration. This allows for a building a process for a completely scripted Hyperion environment.

The install was recorded like this:
/u0/install/epm/installTool.sh -record /u0/automation/epm/silentInstall.xml

The install is invoked like this:
/u0/install/epm/installTool.sh -silent /u0/automation/epm/silentInstall.xml

The config tool was recorded like this:
/u0/Oracle/Middleware/EPMSystem11R1/common/config/11.1.2.0/configtool.sh -record /u0/automation/epm/EPMconfig_Foundation.xml

The config tool is invoked like this:
/u0/Oracle/Middleware/EPMSystem11R1/common/config/11.1.2.0/configtool.sh -silent /u0/automation/epm/EPMconfig_Foundation.xml


For a closer look, let's dive into the files:
silentInstall.xml

<?xml version="1.0" encoding="UTF-8"?>
<HyperionInstall>
  <HyperionHome>/u0/Oracle/Middleware</HyperionHome>
  <UserLocale>en_US</UserLocale>
  <ActionType>0</ActionType>
  <SelectedProducts>
        <Product name="foundation">
            <ProductComponent name="foundationServices">
                <Component>hssWebApp</Component>
                <Component>staticContent</Component>
                <Component>weblogic</Component>
            </ProductComponent>
            <ProductComponent name="Calc">
                <Component>CalcWebApp</Component>
            </ProductComponent>
        </Product>
        <Product name="essbase">
            <Component>essbaseWebApp</Component>
            <Component>essbaseApsWebApp</Component>
            <Component>essbaseApsWebAppSamples</Component>
            <Component>essbaseStudioService</Component>
            <Component>essbaseStudioServiceSamples</Component>
            <Component>essbaseService</Component>
            <Component>essbaseServiceSamples</Component>
        </Product>
        <Product name="reportingAndAnalysis">
            <ProductComponent name="raFramework">
                <Component>raFrameworkWebApp</Component>
                <Component>raFrameworkService</Component>
            </ProductComponent>
            <ProductComponent name="fr">
                <Component>frWebApp</Component>
            </ProductComponent>
        </Product>
        <Product name="planning">
            <Component>planningWebApp</Component>
        </Product>
        <Product name="disclosure">
            <Component>disclosureWebApp</Component>
        </Product>
        <Product name="hfm">
            <Component>hfmAdmClient</Component>
            <Component>hfmWebApps</Component>
            <Component>hfmService</Component>
        </Product>
        <Product name="erpi">
            <Component>erpiWebApp</Component>
        </Product>
        <Product name="profitability">
            <Component>osloWebApp</Component>
            <Component>osloWebAppSamples</Component>
        </Product>
    </SelectedProducts>
  <ProductHomes/>
  <UpgradeCleanUp/>
  <UninstallCleanUp>false</UninstallCleanUp>
</HyperionInstall>


Silent Config - EPMconfig_Foundation.xml
It is pretty straight forward. Each product has its own section, and under each section is a task that you would find in the config tool. Following there are a series of bean objects which define the configuration values for each component.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<products>
  <instance>/u0/Oracle/Middleware/user_projects/epmsystem1</instance>
  <enable_compact_deployment_mode>true</enable_compact_deployment_mode>
  <auto_port_tick>true</auto_port_tick>
  <product productXML="Foundation">
    <tasks>
      <task>applicationServerDeployment</task>
      <task>FndCommonSetting</task>
      <task>preConfiguration</task>
      <task>relationalStorageConfiguration</task>
      <task>WebServerConfiguration</task>
    </tasks>
    <bean name="main">
      <bean name="applicationServerDeployment">
        <bean name="WebLogic 10">
          <property name="adminHost">localhost</property>
          <property name="adminPassword">AgzKbSiBZt2xNcQYXYjZ7qMeHz0qv6U7PosgZx76RSdPJqnOCohak8JSWBpC8ngw</property>
          <property name="adminPort">7001</property>
          <property name="adminUser">epm_admin</property>
          <beanList name="applications">
            <listItem>
              <bean>
                <property name="compactPort">9000</property>
                <property name="compactServerName">EPMServer</property>
                <property name="compactSslPort">9443</property>
                <property name="component">Shared Services</property>
                <beanList name="contexts">
                  <listItem>
                    <property>interop</property>
                  </listItem>
                </beanList>
                <property name="enable">true</property>
                <property name="port">28080</property>
                <property name="serverName">FoundationServices</property>
                <property name="sslPort">28443</property>
                <property name="validationContext">interop</property>
              </bean>
            </listItem>
          </beanList>
          <property name="BEA_HOME">/u0/Oracle/Middleware</property>
          <property name="domainName">EPMSystem</property>
          <property name="manualProcessing">false</property>
          <property name="remoteDeployment">false</property>
          <property name="serverLocation">/u0/Oracle/Middleware/wlserver_10.3</property>
        </bean>
      </bean>
      <bean name="customConfiguration">
        <property name="AdminEmail"/>
        <property name="adminPassword">2CvVUAlFeGfG1/SW1TS3u6b8wcJouqEEKp6s0KfyD806sQuDkm2LbJLNkUt4iY0S</property>
        <property name="adminUserName">admin</property>
        <property name="common_lwa_set">false</property>
        <property name="enable_SMTPServer_Authentication">false</property>
        <property name="enable_ssl">false</property>
        <property name="enableSslOffloading">false</property>
        <property name="externalUrlHost"/>
        <property name="externalUrlPort"/>
        <property name="filesystem.artifact.path">import_export</property>
        <property name="isSSLForSMTP">false</property>
        <property name="relativePaths"/>
        <property name="relativePathsInstance">filesystem.artifact.path</property>
        <property name="SMTPHostName"/>
        <property name="SMTPMailServer"/>
        <property name="SMTPPort">25</property>
        <property name="SMTPPortIncoming">143</property>
        <property name="SMTPServerPassword"/>
        <property name="SMTPServerUserID"/>
      </bean>
      <bean name="httpServerConfiguration">
        <property name="displayVersion">10.3.6</property>
        <property name="port">9000</property>
        <property name="protocol">http</property>
        <bean name="Proxy">
          <property name="path"/>
          <property name="port">9000</property>
          <property name="useSSL">false</property>
        </bean>
        <property name="sharedLocation">use_local_instance</property>
      </bean>
      <bean name="lwaConfiguration">
        <beanList name="batchUpdateLWAComponents"/>
        <beanList name="deploymentLWAComponents"/>
      </bean>
      <bean name="relationalStorageConfiguration">
        <bean name="ORACLE">
          <property name="createOrReuse">create</property>
          <property name="customURL">false</property>
          <property name="dbIndexTbsp"/>
          <property name="dbName">HYPDB</property>
          <property name="dbTableTbsp"/>
          <property name="dropRegistry">true</property>
          <property name="encrypted">true</property>
          <property name="host">epmvirt</property>
          <property name="jdbcUrl">jdbc:oracle:thin:@EPMVirt:1521:HYPDB</property>
          <property name="password">u/3u8zGjUgl6ekXFWdmCw8Ep992dW5WySl5q22W5Ty6kvzPM8FFJegduUsHaVXah</property>
          <property name="port">1521</property>
          <property name="SSL_ENABLED">false</property>
          <property name="userName">EPM_HSS</property>
          <property name="VALIDATESERVERCERTIFICATE">true</property>
        </bean>
      </bean>
      <property name="shortcutFolderName">Foundation Services</property>
    </bean>
  </product>
  <product productXML="workspace">
    <tasks>
      <task>applicationServerDeployment</task>
    </tasks>
    <bean name="main">
      <bean name="applicationServerDeployment">
        <bean name="WebLogic 10">
          <property name="adminHost">localhost</property>
          <property name="adminPassword">VH+syQvfsdYnKKP6VHA7OVvVTOa5kHSulb6MOJuJVQAJxGGVM12fO+fo0QDTp4//</property>
          <property name="adminPort">7001</property>
          <property name="adminUser">epm_admin</property>
          <beanList name="applications">
            <listItem>
              <bean>
                <property name="compactPort">9000</property>
                <property name="compactServerName">EPMServer</property>
                <property name="compactSslPort">9443</property>
                <property name="component">Workspace</property>
                <beanList name="contexts">
                  <listItem>
                    <property>workspace</property>
                  </listItem>
                </beanList>
                <property name="enable">true</property>
                <property name="port">28080</property>
                <property name="serverName">FoundationServices</property>
                <property name="sslPort">28443</property>
                <property name="validationContext">workspace/status</property>
              </bean>
            </listItem>
          </beanList>
          <property name="BEA_HOME">/u0/Oracle/Middleware</property>
          <property name="domainName">EPMSystem</property>
          <property name="manualProcessing">false</property>
          <property name="remoteDeployment">false</property>
          <property name="serverLocation">/u0/Oracle/Middleware/wlserver_10.3</property>
        </bean>
      </bean>
      <bean name="httpServerConfiguration">
        <property name="contextRoot">workspace</property>
        <property name="host">null</property>
        <property name="port">19000</property>
        <property name="protocol">http</property>
      </bean>
      <bean name="lwaConfiguration">
        <beanList name="batchUpdateLWAComponents"/>
        <beanList name="deploymentLWAComponents"/>
      </bean>
      <property name="shortcutFolderName">Workspace</property>
    </bean>
  </product>
</products>

Tuesday, May 12, 2015

Basic WebLogic Tuning in EPM

Part of the EPM Infrastucture Tuning Guide is a section on WebLogic tuning. The guide can be found here, https://blogs.oracle.com/pa/resource/Oracle_EPM_11_1_2_3_Tuning_Guide_v2.pdf 

These steps are recommended, and will be required because any time you create an SR with Oracle they will ask, "Did you apply the recommended tuning guidelines?" Responding no to this question usually results in an impasse and failure to complete the SR.

The basic WebLogic Tuning steps are:
1. Increase number of threads in connection pool
2. Tune connection backlog buffering
3. Stuck thread detection behavior tuning
4. Enable Native IO Performance pack.

Analyzing this a bit, steps 2 and 4 are already configured by default in 10.3.x. Consequently, they can be ignored.

The tuning guide shows how to somewhat tediously go and set the database connection pool settings and stuck threads using some screenshots. However, a scripted approach might be better to apply the settings in multiple environments, and ensure uniformity over time. A quick WLST script is below to do this. Keep in mind the tuning settings are just suggested values. For instance, stuck thread time is more an indicator of how long your longest acceptable web transaction might be, which is customer specific. In the script below EAS is configured with a different thread value than the rest of the managed servers, as an example.

Important Note: Once the connection pool is increased, you may also need to increase the number of available connections the database allows. For example, you may want to increase the number of processes in the Oracle Database.

File: TuneWL.py
connect(.....) 
edit() 
startEdit()

servers = cmo.getServers()
for server in servers:
  name=server.getName()
  print 'Tuning Cluster: '+ name
  cd('/Servers/' + name)   
  t_time = 1200
  if "Essbase" in name:
    t_time = 2400
    print "Setting EAS thread time to " + str(t_time)
  cmo.setStuckThreadMaxTime(t_time)
  cmo.setStuckThreadTimerInterval(t_time)


# JDBC connection pool

dsName = 'calc_datasource'cd('/JDBCSystemResources/'+dsName+'/JDBCResource/'+dsName+'/JDBCConnectionPoolParams/'+dsName)cmo.setMaxCapacity(30)

dsName = 'eas_datasource'cd('/JDBCSystemResources/'+dsName+'/JDBCResource/'+dsName+'/JDBCConnectionPoolParams/'+dsName)cmo.setMaxCapacity(30)

dsName = 'EPMSystemRegistry'cd('/JDBCSystemResources/'+dsName+'/JDBCResource/'+dsName+'/JDBCConnectionPoolParams/'+dsName)cmo.setMaxCapacity(150)

dsName = 'planning_datasource'cd('/JDBCSystemResources/'+dsName+'/JDBCResource/'+dsName+'/JDBCConnectionPoolParams/'+dsName)cmo.setMaxCapacity(150)

dsName = 'raframework_datasource'cd('/JDBCSystemResources/'+dsName+'/JDBCResource/'+dsName+'/JDBCConnectionPoolParams/'+dsName)cmo.setMaxCapacity(150)

save() 
activate() 
Output:

C:\Oracle\Middleware\user_projects\domains\EPMSystem\bin>setDomainEnv.cmd
C:\Oracle\Middleware\user_projects\domains\EPMSystem>java weblogic.WLST TuneWL.py


Starting an edit session ...
Started edit session, please be sure to save and actictivate changes once you are done.

Tuning Cluster: FoundationServices0
Tuning Cluster: EssbaseAdminServices0
Setting EAS thread time to 2400
Tuning Cluster: EpmaDataSync0

...

Wednesday, May 6, 2015

Windows service restart in action...

A while back I wrote a post on automatic service restarts using Windows:
http://epm-errors.blogspot.com/2015/04/auto-restarting-windows-services.html

I recently had the opportunity to watch one of the Windows automatic service restarts happen in a real production scenario. Complete with pagers firing, alerts going off, and, at the end of it, a completely automatic service restart. This type of event is somewhat hard to trigger for testing, so it was nice to see it work as expected.

The event looked like this in Windows Event Viewer:

By the time I got logged in and was looking into the log file I noticed WebLogic starting back up. Ultimately, no action was required. I never did find any error in the logs. However, there were some odd access log entries relating to a security scan and I think this ultimately caused EAS to crash.

Thursday, April 16, 2015

Linux Tips - Beyond the Basics (Part 2)

Find log files in / last modified within 1 day
This is great when you are on an unfamiliar system or don’t know which log file you need to look at.
find / -name "*.log" -mtime -1


Interactively look at recent logs
Each log file modified within 1 day is constantly streamed. Simply retry the operation and watch the new log messages appear on the screen.
find / -name "*.log" -mtime -1 | xargs tail -f


Parsing Command Output
It is good to be able to quickly parse output from the command line. Fortunately with awk and grep, many operations are possible with basic knowledge.


Hypothetically, let’s say you want to find the number of network packets on the eth1 interface.
There are some easy commands to get this other commands to get this information, but, for the sake of learning, this example just uses basic parsing.


First the ifconfig command tells us this info on each interface, including the number of received packets, RX packets:


eth1      Link encap:Ethernet  HWaddr 08:00:27:B0:6A:94
         inet addr:192.168.56.101  Bcast:192.168.56.255  Mask:255.255.255.0
         inet6 addr: fe80::a00:27ff:feb0:6a94/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:1522 errors:0 dropped:0 overruns:0 frame:0
         TX packets:1461 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:133938 (130.7 KiB)  TX bytes:339721 (331.7 KiB)


lo        Link encap:Local Loopback
         inet addr:127.0.0.1  Mask:255.0.0.0
         inet6 addr: ::1/128 Scope:Host
         UP LOOPBACK RUNNING  MTU:65536  Metric:1
         RX packets:0 errors:0 dropped:0 overruns:0 frame:0
         TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

But we want to only focus on the RX packets. Let's first eliminate the loopback interface. You could easily run ifconfig eth1, but it is also possible to grep for the data.


The following will focus on the "eth1" line and also display 5 lines after the eth1 match to get the RX packets line:


ifconfig | grep eth1 -A5


eth1      Link encap:Ethernet  HWaddr 08:00:27:B0:6A:94
         inet addr:192.168.56.101  Bcast:192.168.56.255  Mask:255.255.255.0
         inet6 addr: fe80::a00:27ff:feb0:6a94/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:1556 errors:0 dropped:0 overruns:0 frame:0
         TX packets:1502 errors:0 dropped:0 overruns:0 carrier:0


Next, let’s keep trying to focus on the RX packets:
ifconfig | grep eth1 -A5 | grep RX  
[root@localhost tmp]# ifconfig | grep eth1 -A5 | grep RX
         RX packets:1572 errors:0 dropped:0 overruns:0 frame:0
Now it is down to parsing this line for the number of packets, say 1572...


The awk command can be used to split a line into columns where spaces separate the fields.
Therefore, if each space in the line above represents a column we can focus on column two:


ifconfig | grep eth1 -A5 | grep RX | awk ' {  print $2 } '
packets:1572


Great, now how can we strip out the “packets:” from the raw number we are looking for?
We already know that awk is good for selecting columns from text data. As the string is now, it can also be interpreted in column format by switching the delimiter from a space to colon. Therefore, selecting column 2 using a ":" delimiter with awk will finally give us the number
[root@localhost tmp]# ifconfig | grep eth1 -A5 | grep RX | awk ' {  print $2 } ' | awk -F ":" ' { print $2 } '
1616


That's it!


What can be done with the number of packets? If you take a sample before after, say for 60 seconds one can derive a rough estimate of packets per second on the interface.


LENGTH=60
START=`ifconfig | grep eth1 -A5 | grep RX | awk ' {  print $2 } ' | awk -F ":" ' { print $2 } '`;
sleep $LENGTH
END=`ifconfig | grep eth1 -A5 | grep RX | awk ' {  print $2 } ' | awk -F ":" ' { print $2 } '`;
echo $START $END $LENGTH | awk ' { print ($2-$1)/$3 } '


Again the awk command in the last line is used to parse columns but additionally it can be used for doing simple arithmetic with the columns. For instance, packets/sec = ($END-$START)/$LENGTH


The thing to remember about this example is that this is not difficult. The solution only uses simple features of awk and grep.


This exercise can also be done without using ifconfig. As an exercise on your own try using
only the file /proc/net/dev and grep/awk commands. The same metric, RX packets, can be found here:
Inter-|   Receive                                                |  Transmit
face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
 eth1:  359967    4110    0    0    0     0          0         1   786631    3985    0    0    0     0       0          0
 eth2:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
   lo:       0    



Thursday, April 9, 2015

Linux Tips - Beyond the Basics (Part 1)

As a lot of recent posts have focused on Windows tips, a coworker of mine suggested writing up some Unix tips. Many day to day operations in Unix can be performed using the same old techniques and a lot of Unix users become stagnant and do not progress to help improve Unix skills beyond the basics. Here are everyday tips I still find useful from my days as a Unix admin.

There is a lot that can be covered, at this point it looks like this will be a multi-part series of articles. Feel free to comment with any suggested topics.

Simple Bash Navigation
There is no excuse for fumbling around on the command line. This is your primary interface to Unix and Hyperion. However, this all breaks down when your boss is looking over your shoulder and you start fumbling around like a novice.

(emacs key bindings)
CTRL +a beginning of line,
CTRL+ e end of line
CTRL + k - cut from current position to end of the line
CTRL + y - yank back the cut contents at the current cursor position
Learn other cool tips like swapping the position of command line arguments:
http://ss64.com/bash/syntax-keyboard.html

Ctrl+r at the Bash Shell
This is the recall command; the most useful feature in the bash shell. Maneuvering in the Unix shell often requires complex commands and long paths. Most users are familiar with the history command and the up and down arrows to access recent history. However, this is inefficient. At the command prompt press CTRL+r and type any unique part of the string containing the command that you want to recall. This powerful feature lets you pull up anything from the history by typing in a few unique characters. You may have used this before. Web browsers have copied this feature in the URL prompt. For instance, typing /em in Chrome can pull back "https://server:7002/em". After some time using this feature your brain starts to rewire and it becomes a snap to use "recall" mode to rapidly pull up commands to run or edit.

If there is more than one match, pressing CTRL+r again will go to a prior match the command history. CTRL + p and CTRL + r also cycle back and forth between matches.

Advanced users will find that the more commands you type in with full paths, the more the recall feature helps. For instance, if you run a command by changing into 6 directories with individual cd commands and then run the command with a relative path, the recall feature won't help piece this together. It is encouraged to enter the full paths to the command so it is easily run from any location at a different time.

Become Very Familiar with your Editor of Choice
The text editor is the most powerful tool on the Linux box. Vi or EMACS or Other? Either way - learn more than the basics, these are powerful editors can can save tons of time.

Emacs:
https://www.gnu.org/software/emacs/refcards/pdf/refcard.pdf

Vi:
http://www.albany.edu/faculty/hy973732/ist535/vi_editor_commands.pdf

Screen
The Unix command prompt is a fragile thing when running over a network connection via SSH. Ever run a command, have it take longer than expected and now it's time to go home? Logging out will surely kill the long running process. This is very useful when running database backups that will take a while, yet the default timeout for your bash shell is only 20 minutes. This prevents the backup terminating when your shell is killed.


Screen basics:
Before running long running command type "screen" to enter screen session.

screen -S ScreenTest

sleep 1000000

Now detach your session
CTRL+a d

[detached from 11504.ScreenTest]

Log out and go home for the day.

When you get home, check what's running:
:~$ screen -ls
There are screens on:
        11504.ScreenTest        (04/08/2015 08:44:26 PM)        (Detached)
1 Socket in /var/run/screen

reattach to the running screen session:
screen -r ScreenTest
sleep 1000000 (sleep command is still sitting there executing)

Killing
kill all processes matching the word "java"

killall java
OR
ps -wwef | grep java | awk ' { print $2 } ' | xargs kill -9

Swiftly kill everything. If you have a service account such as "hyperion" and you need to quickly make sure all processes are down, assuming you have tried all sane ways to do this, you can kill everything running under the current user id by:
kill -9 -1
Tip: Don't do this as root.

Network Testing
The old "telnet" command to test if a service is running on a port is being deprecated. The nc command, "netcat" command can be used for basic connectivity.

:~$ nc -v localhost 80
nc: connect to localhost port 80 (tcp) failed: Connection refused

Also nc can be used to startup a mock network server on a port for testing connectivity between two machines. This allows you to test firewall rules long before you have setup the environment and realize the connectivity is not working.

On server1:
nc -l 1234    (start a server on port 1234)

from another machine, try to connect:
nc -v server1 1234
typing text on the client side should appear on the server side.

Copying Files Between Servers

rsync -a ~/tmp/dir1/ oracle@epmvirt:/tmp/destination

Bash For Loops
for i in a b c; do  echo $i ; done

for server in server1 server2 server3 server4 ; do scp file $server:/tmp/; done

Disk Space Consumption
du -m /u0 | sort -n

36643   /u0/app/oracle/product
68393   /u0/app/oracle/backups
320977  /u0/app/oracle/arch
576131  /u0/app/oracle
576230  /u0/app
605054  /u0/

This shows where to find the largest disk space consumers to quickly mitigate disk space issues. In this case we could check the archive logs directory and reduce disk space.

Learning a Programming Language
Often users get sucked into writing complex shell scripts. They start off innocent enough, but quickly shell scripting will show its weaknesses. Shell scripts become unwieldy with syntax and flow control - loops all over the place, difficult command line argument handling, poor error handling... etc. I encourage having a more elegant language to write scripts in. Any will do, but python can be a quick way to get started and is proven in the real world. Even if you completely new, spending a few hours on a tutorial site and then tackling a simple program in python will pay off in the long run over using a bash script.
A tutorial can be found here:
https://docs.python.org/2/tutorial/

Monday, April 6, 2015

Auto Restarting Windows Services

I wanted to discuss a very simple task to help with keeping services running. In most cases if you have some service running and it unexpectedly crashes, you want it to automatically come back up. This holds true whether you are an EPM pro or just starting out.

In an earlier post, I talked about a more complex event like running an external script upon failure.
http://epm-errors.blogspot.com/2014/12/how-to-triage-hyperion-windows-service.html
The external script solution can be good for very specific cases, but it takes a bit of work to setup. To expand upon this specific case, it is a good idea to have every Hyperion service to auto restart upon failure. This is much simpler than setting up custom scripts. It is a simple property in the Windows service settings.

The settings are located in Windows Services, under the service properties there is a Recovery tab. Here you can set the action to "Restart the Service"



An easy way to script this is to get the Service names for all Hyperion components and write up something like this:
sc \\localhost failure HyS9FRReports actions= restart/60000/restart/60000/""/60000 reset= 0
Localhost can be replaced by the server name.

Some additional tips:
  • Don't infinity restart the service by setting the "subsequent failures" box. At some point you'll want to have it crash and stay down rather than flap up and down constantly.
  • This restart action is not intended to solve the root cause of problems. You should be detecting when the services crash to take additional action on the root cause.
  • This customization is overwritten whenever you run the Hyperion Config Tool. Config Tool resets the service properties and the custom settings would need reapplied.