<< Back to previous view

[SAILFIN-1869] Subscribe-refresh, an instance was killed, were created a lot of errors, communication stopped. Created: 21/Jul/09  Updated: 25/Nov/10  Resolved: 24/Aug/09

Status: Resolved
Project: sailfin
Component/s: session_replication
Affects Version/s: 2.0
Fix Version/s: milestone 1

Type: Bug Priority: Major
Reporter: easarina Assignee: Joe Fialli
Resolution: Fixed Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Operating System: All
Platform: All


Issue Links:
Dependency
depends on SAILFIN-1877 "Cant find matching transactions" msg... Resolved
Issuezilla Id: 1,869
Tags: system-test
Participants: easarina, Joe Fialli and Scott Oaks

 Description   

********************************************************************************
**********************

  • Template v0.1 ( 05/01/08 )
  • Sailfin Stress test issue
    ********************************************************************************
    **********************
    Sailfin Build : 23
    Cluster size : 10
    Happens in a single instance (y/n) ? : NA
    Test id : st2_4_presence_subscribe-refresh
    Location of the test : as-telco-sqe/stress-ws/presence
    JDK version : 1.6.0_14, 64 bits
    CLB used : Yes
    HW LB used : No.
    SSR: Enabled
    =========================================================

SuSe machines (asqe-oblade-{1-10].sfbay.sun.com), one instance per a machine,
Were running 9 sipp.

The loading was -m 333333 -r 305 per one sipp (9 sipp sessions totally).

The run was fine during about 4 hours, until one instance (instance4) was
killed.

Then countless number of error messages were created in server.log files:

==============================================================
SEVERE|sun-glassfish-comms-
server2.0|javax.enterprise.system.container.sip|_ThreadID=22;_ThreadName=SipCont
ainer-serversWorkerThread-5060-6;_RequestID=d13298fe-d6ae-483b-9e29-
37caa9caf17c;|"Cant find matching transaction - Terminating"|#]

WARNING|sun-glassfish-comms-
server2.0|javax.enterprise.system.container.sip|_ThreadID=34;_ThreadName=Thread-
39;_RequestID=d1046b7e-5ae4-48f6-a16f-938f2aa8a336;|Transaction was null:
z9hG4bKd57df8d052efe87b4fd69d3553337d242ea5|#]
================================================================

Finally, after several hours of such error messages in server.log files, the
sipp communication stopped. On sipp screens after the instance was killed I saw
around 300000 errors per a screen.

Please see the logs from this run at /net/asqe-
logs/export1/SailFin/Results/sfbuild23/sbrf.



 Comments   
Comment by easarina [ 22/Jul/09 08:43 AM ]

Added a keyword: system-test

Comment by easarina [ 23/Jul/09 03:44 PM ]

I've re-run this test on x86 machines. When an instance was killed, again was
crated a huge number of error message:
"Cant find matching transaction - Terminating"
"Transaction was null"

But then the heap became Full. And the communication stopped.

See logs from this run at :
/net/asqe-logs/export1/SailFin/Results/sfbuild23/sbrf_run2

Comment by Scott Oaks [ 27/Jul/09 10:55 AM ]

The can't find matching comes from a container bug; that needs to be addressed
before we can investigate if there are additional ill effects.

Comment by easarina [ 30/Jul/09 01:30 PM ]

Build 25. I've executed this test on SuSE machines. Before an instance was
killed, the run was OK. After an instance was killed, I saw error messages in
server.log files, including many "Cant find matching transaction - Terminating"
and soon OOM happened. See all logs under:

http://agni-1.sfbay.sun.com/net/asqe-logs/export1/SailFin/Results/sfbuild25/sbr_ssr/

Comment by Scott Oaks [ 05/Aug/09 11:05 AM ]

Build 25 did not contain the SSR-OOM fixes targeted for build 27 (particular
issues 1862 and 1888).

Errors in the build 25 log prior to the first failure indicate that something
else is likely wrong in the configuration – there were network issues before
any failure was induced.

Need to re-examine for build 27.

Comment by easarina [ 05/Aug/09 11:24 AM ]

I can see in the one server server.log files (inst1) few "Can not find matching
transaction - Terminating" messages and really nothing else. As I can see,
based on the different tests, the number of terminated transactions depends
from the loading. I agree that with new fixes the run has to be executed again.
But could you clarify what was wrong in the configuration.

Comment by Joe Fialli [ 21/Aug/09 07:19 AM ]

reassign

Comment by Joe Fialli [ 21/Aug/09 08:41 AM ]

Patch from tuesday looked to fix this on Steve DiMilla's run of
subscribe refresh. No errors after running for 3-4 days.
This issue is an umbrella issue and this status applies to all
children issues.

Testing latest version of patch today to verify they all remain fixed.

Comment by Joe Fialli [ 24/Aug/09 01:45 PM ]

No longer seeing this issue of Steve DiMilla's subscribe-refresh run
that has been running for 4 days now.

It was fixed by checkin to fix issues 1607 and 1613 on 8/12

Generated at Wed Apr 16 16:05:18 UTC 2014 using JIRA 4.0.2#472.