Why... Why... Why?
This blog is dedicated to documenting error resolution and other tidbits that I discover while working as a Consultant in the Oracle EPM (Hyperion) field. As much of my job revolves around issue resolution, I see an opportunity to supplement the typical troubleshooting avenues such as the Oracle Knowledgebase and Oracle Forums with more pinpointed information about specific errors as they are encountered. Beware, the information found in this blog is for informational purposes only and comes without any warranty or guarantee of accuracy.

EPMVirt: Create your own Oracle Hyperion Virtual Environment:

Sunday, March 29, 2015

Handling WebLogic Failures


The default behavior when a critical error occurs in a running WebLogic application is to do nothing about the failure condition. This can cause the web server, OHS, to keep routing traffic to the service, even in a highly available situation where there is more than one server available. This situation leaves many with HA setups vulnerable to a single point of failure.

The basic problem is that when you have a WebLogic service running, such as HFM, and it encounters a critical situation such as out of memory (OOM) error, the service comes to a halt and cannot process users requests. End users are left with their requests spinning in the web browser until they eventually timeout. It is important to configure the WebLogic Managed Server to acknowledge the failure and shut itself down quickly so that the other member of the cluster can take over.

Two settings in WebLogic can help with this, under the theme Overload protection. The settings are Failure Action and Panic Action.

  

This can be scripted to set for all managed servers using WLST
connect(....)  
edit() 
startEdit()
servers = cmo.getServers()
for server in servers:
  name=server.getName()
               cd('OverloadProtection/' + name)
      cmo.setPanicAction('system-exit')               
      cmo.setFailureAction('force-shutdown')
    save()
    activate()



More Details about overload protection can be found here:
http://www.dba-oracle.com/t_weblogic_overload_protection.htm

Taking this scenario one step further, rather than just failing, Node Manager can be used automatically restart the failed service. However, Node Manager is most often used as an optional component, so you would need to decide whether to implement Node Manager to use these features.




1 comment: