While doing performance comparison between tcp & udp transport mode for publish
scenario we found that udp is not able to handle as high call rate as tcp.
Publish scenario with tcp can handle 10000 calls/sec under stipulated
conditions like 50% cpu utilization, 95% calls should meet response time
criterion etc. While same publish scenario with udp can handle just ~2000
calls/sec under same conditions & environment. Response time grows inordinately
high with higher call rates.
Analysing this revealed that GC pause time is significantly higher than tcp
case. Heap histogram indicated that lots of objects are alive and copying
collector took lot of time to traverse over the heap of live objects.
Discussing with development team revealed an important fact that publish with
udp has an extra Timer J per request to deal with re-transmissions. This timer
times out after 64 * T1 = 64 * 500 = 32 secs. Taking this cue and playing with
the value of T1 we discovered that, this extra timer is fallaciously keeping
whole slew of objects alive for longer time. This in turn causes high GC pause
time and poor performance with udp.
Here is an object refrence chain of interest captured using heap-dump for a
com.ericsson.ssa.container.SipContainerThreadPool$SipContainerFutureTask (has a
field "future" to hold) -->
com.ericsson.ssa.sip.timer.GeneralTimerImpl(has field "listener" to hold)-->
(this is TimerJ that times out after 32 secs)
com.ericsson.ssa.sip.transaction.NonInviteServerTransaction (has a
field "_response" to hold) -->
com.ericsson.ssa.sip.SipServletResponseImpl (has a field "_currentRequest" &
lots of other fields to hold) -->
com.ericsson.ssa.sip.SipServletRequestImpl (holds lot of objects) --> -->
As shown, in order to serve the request, virtually every heavy object created
down the path is kept alive till TimerJ times out after 32 seconds. I discussed
this with Ramesh, he aggrees that its incorrect to hold all these objects,
instead we should just hold transaction id to deal with re-transmissions.
Instead of using NonInviteServerTransaction object as a listener for the time
out, it would be better to create some other lightweight object as its listener
that has reference to just transaction id.