[GLASSFISH-15637] IIOP Loadbalancing not happening Created: 20/Jan/11  Updated: 13/Feb/13  Due: 01/Feb/11

Status: Reopened
Project: glassfish
Component/s: rmi_iiop_load_balancer
Affects Version/s: 3.1_b38
Fix Version/s: future release

Type: Bug Priority: Major
Reporter: gopaljorapur Assignee: Harshad Vilekar
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File patch.tgz    
Tags: 3_1-exclude, 3_1_1-exclude, 3_1_1-scrubbed, 3_1_2-exclude

 Description   

IIOP Loadbalancing not happening with certain apps

The Initial context is created as follows

public void createContextForACC()
{
try

{ InitialContext initial = new InitialContext(); myEnv = (InitialContext)initial.lookup("java:comp/env/ejb"); }

catch(NamingException ne)

{ System.err.println("Caught an unexpected exception!"); ne.printStackTrace(); }

}

All new ic creations are going to same instance



 Comments   
Comment by Ken Cavanaugh [ 21/Jan/11 ]

I don't understand what is failing here. Is this an issue about some of the failover
tests? Which ones?

Or is it an issue about how the InitialContext is created?

I also have no idea what a lookup of java:comp/env/ejb is supposed to do.
This is not part of IIOP or FOLB.

Please clarify or close this issue.

Comment by gopaljorapur [ 21/Jan/11 ]

The loadbalancing is not happening, all new ic creations happen on one random instance in the cluster

Comment by Ken Cavanaugh [ 25/Jan/11 ]

This issue does not currently correctly describe the observed problem.
I THINK (from email and conversation with Gopal) that the problem is
that loadbalancing does not happen when the endpoints property is specified
in the app client command, but DOES happen when the endpoints property
is passed to the (first) new InitialContext call.

I will investigate that question in my testing shortly.

Comment by Ken Cavanaugh [ 26/Jan/11 ]

My test (same as the 14867 existing instance case) works with -Dcom.sun.appserver.iiop.endpoints specified,
but fails with appclient -targetserver. I've sent email to Tim to see if this is a configuration error in
my code, or an error in the ACC argument parsing.

Comment by Ken Cavanaugh [ 30/Jan/11 ]

This patch contains two fixes:

1. The sticky context reference count has been fixed to avoid extra increments, which
creates a "stuck" condition in which the same SerialContext underlying the InitialContext
call is used all the time. This fix is in glassfish-naming.

2. I added code to the ORB to detect and specially label name service implementations.
The FOLB server group manager then arranges to always send a membership label
update on the reply to any name server invocation. This means that the
lookup call on the new InitialContext will get any updates to the cluster shape.
This in turn prevents the situation where the client is always obtaining up-to-date
references from naming, but the new InitialContext call is not informed of the changes
(which happens only the the ClientGroupManager receives a response contains an
updated IOR for a membership label change).

Just unpatch the patch into the modules directory as usual.

Comment by Ken Cavanaugh [ 31/Jan/11 ]

How bad is its impact? (Severity)
Regression in IIOP FOLB from GF 2.1.

How often does it happen? (Frequency)
Happens all the time in the following test scenario (which should have
been added earlier to this issue):

0. Start with com.sun.appserv.iiop.endpoints referring to two elements, the one to start in step 2
and another in the cluster.
1. Stop the cluster.
2. Start 1 instance (inst) in the cluster.
3. LB to one instance
4. Start cluster
5. Kill inst
6. LB test

This fails because LB happens to only one instance.

How much effort is required to fix it?
I have the fixes made, and the IIOP FOLB dev test passes.
Fixes are in the ORB (new support for labelling object implementations as being in a name service)
and in glassfish-naming (Fixes in how endpoint lists are merged in RoundRobinPolicy,
and fixes in sticky context reference counting in SerialContext)

What is the risk of fixing it? (Risk)
Low. We have good test coverage for all areas. The only likely impact is on IIOP FOLB.
Other functions of the ORB are unaffected by these changes.

Does a work around for the issue exist? Can the workaround be reasonably employed by the end user?
I don't think a reasonable workaround exists.

If the issue is not fixed should the issue and its workaround (if applicable) be described in the Release Notes?
N/A

How long has the bug existed in the product?
Since the beginning of GF 3.1 FOLB development (roughly 6 months)

Do regression tests exist for this issue?
Yes: IIOP FOLB dev test test15736.

Which tests should QA (re)run to verify the fix did not destabilize GlassFish?
The various tests that were failing that caused this issue to be filed (Gopal has the tests).

When will a tested fix be ready for integration?
Probably 1/31/11-2/1/11.

Comment by gopaljorapur [ 31/Jan/11 ]

he issue in 15637 is about IIOP Loadbalancing not happening , Scenario is as follows (I will update the issue with this scenario)

1. Start Cluster with 5 instances
2. Create 12 InitialContext in a loop, create SFSB reference by looking ejb
3. grep for "SFSB Bean!" in server.log of all instances, you will see 12 of them in only one instance (incorrect behavior, load should be loadbalanced across cluster)

The test code is as follows

/// When we run appclient, we provide test id as RMIIIOPFOTC4, this creates 12 ic and 12 sfsb remote ref

if(testid.equals("RMIIIOPFOTC4"))
{
for(int i=0;i<12;i++)

{ client.createContextForACC(); client.createSFSBRemoteRef(); }

System.out.println("Test Passed");
}

//// Here is how ic is created

public void createContextForACC()
{
try

{ Context initial = new InitialContext(); myEnv = (Context)initial.lookup("java:comp/env/ejb"); }

catch(NamingException ne)

{ System.err.println("Caught an unexpected exception!"); ne.printStackTrace(); }

}

///// Here is how sfsb remote ref is
public void createSFSBRemoteRef()
{
try

{ Object sfsbobjref = myEnv.lookup("TestSFSB"); sfsbhomeref = (SFSBRemoteHomeRef)PortableRemoteObject.narrow(sfsbobjref, SFSBRemoteHomeRef.class); sfsbref = sfsbhomeref.create("SFSB Bean!"); System.out.println(sfsbref.validate()); }
catch(Exception exc)
{ exc.printStackTrace(); }
}




*******************************************************************************************************************************************************

The scenario that works:

The variation of the test, RMIIOPFOTC4A

1. Start Cluster with 5 instances
2. Create 12 InitialContext with endpoint properties in the argument in a loop, create SFSB reference by looking ejb
3. grep for "SFSB Bean!" in server.log of all instances, you will see load distributed across all instances in the cluster (correct behavior)

Test code is as follows


/// When we run appclient, we provide test id as RMIIIOPFOTC4A, this creates 12 ic and 12 sfsb remote ref
if(testid.equals("RMIIIOPFOTC4A"))
{
for(int i=0;i<12;i++)
{ client.createContextForStandalone(); client.createSFSBRemoteRef(); }
System.out.println("Test Passed");
}

//// This is how createContextForStandalone

public void createContextForStandalone()
{
try
{ myEnv = new InitialContext(properties); }
catch(Exception exc)
{ System.err.println("Caught an unexpected exception!"); exc.printStackTrace(); }
}



///// This is how createSFSBRemoteRef is done ( its same as scenario when test fails)

public void createSFSBRemoteRef()
{
try
{ Object sfsbobjref = myEnv.lookup("TestSFSB"); sfsbhomeref = (SFSBRemoteHomeRef)PortableRemoteObject.narrow(sfsbobjref, SFSBRemoteHomeRef.class); sfsbref = sfsbhomeref.create("SFSB Bean!"); System.out.println(sfsbref.validate()); }

catch(Exception exc)

{ exc.printStackTrace(); }

}

Here is how to deploy
asadmin --user admin deploy --retrieve /export/DecCVS/agentrepo//appclient --availabilityenabled=true --target st-cluster --force=true RMIIIOPFailover.ear

appclient execution
/export/DecCVS/glassfish3/glassfish/bin/appclient -Dcom.sun.appserv.iiop.endpoints=hat2k1.us.oracle.com:23700,hat2k1.us.oracle.com:23701,hat2k2.us.oracle.com:23700,hat2k2.us.oracle.com:23701,hat2k2.us.oracle.com:23702 -client /export/DecCVS/agentrepo/appclient/RMIIIOPFailoverClient/rmi-iiop-client2Client.jar -mainclass samples.rmiiiopclient.client.ACC_Standalone_Client

Comment by Chris Kasso [ 31/Jan/11 ]

Approved for RC2.

Comment by Ken Cavanaugh [ 31/Jan/11 ]

At this point, for tracking purposes, 15637 is for the scenario I outlined above:

0. Start with com.sun.appserv.iiop.endpoints referring to two elements, the one to start in step 2
and another in the cluster.
1. Stop the cluster.
2. Start 1 instance (inst) in the cluster.
3. LB to one instance
4. Start cluster
5. Kill inst
6. LB test (which should show LB across running instances)

It's TOO LATE to change the test scenario for 15637, especially since you did not include a sufficient
description initially. I have clearly identified some defects here that need to be fixed,
so please file a NEW issue for the scenarios you have identified above. Please also indicate in any
new issues whether or not stateful vs. stateless EJBs make any difference. This does not matter
to the ORB at all, but some of your tests seem to indicate failures on the SFSB side only.
If this matters, I'll also need to add SFSB support to the dev tests.

Comment by gopaljorapur [ 31/Jan/11 ]

I have opened an issue 15768 for the Loadbalancing issue mentioned in my earlier comment

Comment by Ken Cavanaugh [ 31/Jan/11 ]

Fixed in GF rev 44807.
This includes integration of ORB version 3.1.0-b025.

Comment by gopaljorapur [ 17/Feb/11 ]

With old styled apps, this issue is not fixed

Comment by Ken Cavanaugh [ 17/Feb/11 ]

I was certain I fixed this, but I'll test it again, and target it for
3.2.

Generated at Mon May 25 18:59:29 UTC 2015 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.