Issue Details (XML | Word | Printable)

Key: SAILFIN-1762
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Blocker Blocker
Assignee: rampsarathy
Reporter: amitagarwal
Votes: 0
Watchers: 1

If you were logged in you would be able to see more operations.

poor performance for publish scenario with udp transport mode

Created: 07/May/09 08:33 AM   Updated: 16/Jun/09 01:02 AM   Resolved: 16/Jun/09 01:02 AM
Component/s: sip_container
Affects Version/s: 2.0
Fix Version/s: milestone 1

Time Tracking:
Not Specified


Operating System: Linux
Platform: All

Issuezilla Id: 1,762
Participants: amitagarwal, ehsroha, rampsarathy and Scott Oaks

 Description  « Hide

While doing performance comparison between tcp & udp transport mode for publish
scenario we found that udp is not able to handle as high call rate as tcp.
Publish scenario with tcp can handle 10000 calls/sec under stipulated
conditions like 50% cpu utilization, 95% calls should meet response time
criterion etc. While same publish scenario with udp can handle just ~2000
calls/sec under same conditions & environment. Response time grows inordinately
high with higher call rates.

Analysing this revealed that GC pause time is significantly higher than tcp
case. Heap histogram indicated that lots of objects are alive and copying
collector took lot of time to traverse over the heap of live objects.

Discussing with development team revealed an important fact that publish with
udp has an extra Timer J per request to deal with re-transmissions. This timer
times out after 64 * T1 = 64 * 500 = 32 secs. Taking this cue and playing with
the value of T1 we discovered that, this extra timer is fallaciously keeping
whole slew of objects alive for longer time. This in turn causes high GC pause
time and poor performance with udp.

Here is an object refrence chain of interest captured using heap-dump for a
particular request,$SipContainerFutureTask (has a
field "future" to hold) --> field "listener" to hold)-->
(this is TimerJ that times out after 32 secs) (has a
field "_response" to hold) --> (has a field "_currentRequest" &
lots of other fields to hold) --> (holds lot of objects) --> -->

As shown, in order to serve the request, virtually every heavy object created
down the path is kept alive till TimerJ times out after 32 seconds. I discussed
this with Ramesh, he aggrees that its incorrect to hold all these objects,
instead we should just hold transaction id to deal with re-transmissions.
Instead of using NonInviteServerTransaction object as a listener for the time
out, it would be better to create some other lightweight object as its listener
that has reference to just transaction id.

Scott Oaks added a comment - 07/May/09 08:52 AM

Add cc

ehsroha added a comment - 11/May/09 06:25 AM

Reassign to Ramesh.

Historical background and proposed quick fix

Back in the days of EAS we basicly did the same observation that the big
performance difference between UDP and TCP was due to that the transaction keeps
a reference to the Response over the period of timerJ (32 sek). The fix back
then was to serialize the response to a bytebuffer and use the serialized
version for retransmissions.

Appearantly during sailfin development this optimization clashed with the need
of clb to modify certain headers after the response has left the transaction
layer so the serialize-to-buffer part was removed. For the EAS version of the
method serializeForRetransmission() in SipServletResponseImpl you can view
verion 0 in cvs.

A quick fix of the former solution to work with clb is to make sure that the
serialize-to-buffer is deferred until clb is done with its header manipulations.
Note that still the transaction store/restore has to be made directly (as
currently implemented).

Given this we think the best place to call the serialize-to-buffer method of
ServletResponseImpl is in NetworkLayer as the serialization should be done both
with and without the CLB layer (and after CLB layer). Note that one must
consider threading safety when implementing this (retransmission might happen
while serialize-to-buffer etc).

/Robert & Peter

rampsarathy added a comment - 16/Jun/09 01:02 AM

The serialization happens in the network manager just before
the message is sent out. All the responses going through
NonInviteServerTransaction (toCompleted) method will have the
needserialization flag set on the response , based on which the NM will
serialize the message (if its not already serialized). There is a check
added in the CLB backend to ensure that serialized responses are just
passed through.
I have run the QLs, FTs on both UDP and TCP and sip container dev tests
, no regressions so far.