mq
  1. mq
  2. MQ-343

file store fail to recover if crashed right after commit processing and before checkpoint when sent message involved in cluster txn

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 5.0
    • Fix Version/s: 5.1 (RI-Bug-Fix)
    • Component/s: broker-core
    • Labels:
      None
    • Environment:

      Glassfish 3.1.2.2 two server Cluster patched with OpenMQ5.0 (Build 14-e)

      Description

      To work around issue MQ-317 we have been forced to try patch GF 3.1.2.2 with MQ5.0 and access it as a JMS1.1 provider.

      To replicate a server crash we place the app server under a moderate load (5 messages per second) and kill all Java processes. When the app server and broker are restarted, the following messages are logged:

      [#|2013-08-26T20:06:17.255+0000|SEVERE|5.0|imq.log.Logger|_ThreadID=1;_ThreadName=main;|ERROR Mismatch in number of logged and stored consumers for 52751-10.240.164.211(f8:df:d4:6a:5a:1)-40158-1377547334493 logged=0 stored=1

      #]

      [#|2013-08-26T20:06:17.256+0000|SEVERE|5.0|imq.log.Logger|_ThreadID=1;_ThreadName=main;|ERROR logged interests does not contain stored interst. sysid= 52751-10.240.164.211(f8:df:d4:6a:5a:1)-40158-1377547334493 consumerid=[consumer:0, type=NONE]

      #]

      [#|2013-08-26T20:06:17.256+0000|FORCE|5.0|imq.log.Logger|_ThreadID=1;_ThreadName=main;|[B1276]: Replace existing message 52751-10.240.164.211(f8:df:d4:6a:5a:1)-40158-1377547334493

      #]

      [#|2013-08-26T20:06:17.256+0000|SEVERE|5.0|imq.log.Logger|_ThreadID=1;_ThreadName=main;|ERROR [B3007]: Message 52751-10.240.164.211(f8:df:d4:6a:5a:1)-40158-1377547334493 on destination Q:centralEventProcessing could not be found in the store

      #]

      [#|2013-08-26T20:06:17.258+0000|INFO|5.0|imq.log.Logger|_ThreadID=1;_ThreadName=main;|[B2096]: Unable to load transaction information.:
      com.sun.messaging.jmq.jmsserver.util.BrokerException: Message 52751-10.240.164.211(f8:df:d4:6a:5a:1)-40158-1377547334493 on destination Q:centralEventProcessing could not be found in the store
      at com.sun.messaging.jmq.jmsserver.persist.file.DstMsgStore.removeMessage(DstMsgStore.java:348)
      at com.sun.messaging.jmq.jmsserver.persist.file.MsgStore.removeMessage(MsgStore.java:387)
      at com.sun.messaging.jmq.jmsserver.persist.file.TransactionLogReplayer.replaySentMessage(TransactionLogReplayer.java:169)
      at com.sun.messaging.jmq.jmsserver.persist.file.TransactionLogReplayer.replaySentMessages(TransactionLogReplayer.java:111)
      at com.sun.messaging.jmq.jmsserver.persist.file.TransactionLogReplayer.replayTransactionWork(TransactionLogReplayer.java:96)
      at com.sun.messaging.jmq.jmsserver.persist.file.ClusterTransactionManager.replayTransactionEvent(ClusterTransactionManager.java:276)
      at com.sun.messaging.jmq.jmsserver.persist.file.TransactionLogManager.replayTransactionLogOnStartup(TransactionLogManager.java:628)
      at com.sun.messaging.jmq.jmsserver.persist.file.TransactionLogManager.startup(TransactionLogManager.java:259)
      at com.sun.messaging.jmq.jmsserver.persist.file.FileStore.init(FileStore.java:478)
      at com.sun.messaging.jmq.jmsserver.data.TransactionList.<init>(TransactionList.java:210)
      at com.sun.messaging.jmq.jmsserver.core.DestinationList.init(DestinationList.java:2714)
      at com.sun.messaging.jmq.jmsserver.core.CoreLifecycleImpl.initDestinations(CoreLifecycleImpl.java:110)
      at com.sun.messaging.jmq.jmsserver.Broker._start(Broker.java:1233)
      at com.sun.messaging.jmq.jmsserver.Broker.start(Broker.java:447)
      at com.sun.messaging.jmq.jmsserver.Broker.brokerMain(Broker.java:2186)
      at com.sun.messaging.jmq.jmsserver.Broker.main(Broker.java:2164)

      #]

      [#|2013-08-26T20:06:17.259+0000|SEVERE|5.0|imq.log.Logger|_ThreadID=1;_ThreadName=main;|ERROR [B3101]: Error retrieving persisted data:
      com.sun.messaging.jmq.jmsserver.util.BrokerException: [B2096]: Unable to load transaction information.
      at com.sun.messaging.jmq.jmsserver.data.TransactionList.<init>(TransactionList.java:235)
      at com.sun.messaging.jmq.jmsserver.core.DestinationList.init(DestinationList.java:2714)
      at com.sun.messaging.jmq.jmsserver.core.CoreLifecycleImpl.initDestinations(CoreLifecycleImpl.java:110)
      at com.sun.messaging.jmq.jmsserver.Broker._start(Broker.java:1233)
      at com.sun.messaging.jmq.jmsserver.Broker.start(Broker.java:447)
      at com.sun.messaging.jmq.jmsserver.Broker.brokerMain(Broker.java:2186)
      at com.sun.messaging.jmq.jmsserver.Broker.main(Broker.java:2164)
      Caused by: com.sun.messaging.jmq.jmsserver.util.BrokerException: Message 52751-10.240.164.211(f8:df:d4:6a:5a:1)-40158-1377547334493 on destination Q:centralEventProcessing could not be found in the store
      at com.sun.messaging.jmq.jmsserver.persist.file.DstMsgStore.removeMessage(DstMsgStore.java:348)
      at com.sun.messaging.jmq.jmsserver.persist.file.MsgStore.removeMessage(MsgStore.java:387)
      at com.sun.messaging.jmq.jmsserver.persist.file.TransactionLogReplayer.replaySentMessage(TransactionLogReplayer.java:169)
      at com.sun.messaging.jmq.jmsserver.persist.file.TransactionLogReplayer.replaySentMessages(TransactionLogReplayer.java:111)
      at com.sun.messaging.jmq.jmsserver.persist.file.TransactionLogReplayer.replayTransactionWork(TransactionLogReplayer.java:96)
      at com.sun.messaging.jmq.jmsserver.persist.file.ClusterTransactionManager.replayTransactionEvent(ClusterTransactionManager.java:276)
      at com.sun.messaging.jmq.jmsserver.persist.file.TransactionLogManager.replayTransactionLogOnStartup(TransactionLogManager.java:628)
      at com.sun.messaging.jmq.jmsserver.persist.file.TransactionLogManager.startup(TransactionLogManager.java:259)
      at com.sun.messaging.jmq.jmsserver.persist.file.FileStore.init(FileStore.java:478)
      at com.sun.messaging.jmq.jmsserver.data.TransactionList.<init>(TransactionList.java:210)
      ... 6 more

      #]

      After that the broker appears to terminate abnormally and restart. It seems to get stuck in a loop of starting up, detecting the corruption and shutting down.

        Activity

        Hide
        amyk added a comment - - edited

        The problem occurs with newTxnLog 'file' store, when a sent message is part of a cluster transaction and broker crashes right after commit processing and before txnlog checkpoint the transaction.

        This is now fixed in 5.0.1 (internal bug 17363755)
        test case (use fault injection)
        tonga/jmsserver/bugs/bugMQ343

        Show
        amyk added a comment - - edited The problem occurs with newTxnLog 'file' store, when a sent message is part of a cluster transaction and broker crashes right after commit processing and before txnlog checkpoint the transaction. This is now fixed in 5.0.1 (internal bug 17363755) test case ( use fault injection ) tonga/jmsserver/bugs/bugMQ343
        Hide
        hopesea added a comment -

        Is this fix available in a binary download or in the hg repo? I would like to test it.

        Show
        hopesea added a comment - Is this fix available in a binary download or in the hg repo? I would like to test it.
        Hide
        amyk added a comment -

        Thanks. Only promoted builds are available at OpenMQ webside and next promotion build is at least a week away. Please send me an email amy dot kang at oracle dot com. I'll send you the jar files to test

        Show
        amyk added a comment - Thanks. Only promoted builds are available at OpenMQ webside and next promotion build is at least a week away. Please send me an email amy dot kang at oracle dot com. I'll send you the jar files to test
        Hide
        hopesea added a comment -

        We are running tests with the jar files you provided and it is looking good. Will report back if we are able to reproduce the original problem. Thanks for the help.

        Show
        hopesea added a comment - We are running tests with the jar files you provided and it is looking good. Will report back if we are able to reproduce the original problem. Thanks for the help.

          People

          • Assignee:
            amyk
            Reporter:
            hopesea
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: