On some of the GF-HA Functional Test setups, the InOrder Metro-HA tests get Interrupted by TestNG due to a configured 5 minute timeout. This interruption results in a failure
Responses on these test setups are slow due to which the RM-Client ends up sending a AckRequested, which further slows down the actual response. After a few messages, the LB begins to return a 503 Service Unavailable to the RM-Client. This make the RM-Client retry messages further slowing down rates of responses. Eventually TestNG interrupts the test once it crosses 5 minutes.
The test may need to be retried with fin-level logging and the setup need to be re-looked at to figure the reason for the slowness of the responses. We may also need to decide if the 5 min timeout is justified or if we need to slow-down the AckRequested rate.
Attaching client-output, serving-instance logs and the lb error logs for one such failure.