glassfish
  1. glassfish
  2. GLASSFISH-15816

[PERF] performance regression in IIOPInputStream.simpleReadObject

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: entity-persistence
    • Labels:
      None

      Description

      The performance of IIOPInputStream.simpleReadObject is
      some 3x slower in GF 3.1 v GF 3 in client-side profiles.

      In V3, that just calls IIOPInputStream.inputObject().

      In V3.1, that spends a little more than half its time in inputObject(),
      and the other large part of its time in inputObjectUsingFVD(). V3 and
      V3.1 do spend the same amount of time in useFullValueDescription() (a
      pretty small amount in either case).

      So it seems that for whatever reason, we are using the FVD in 3.1 and
      not in V3, and that is really slowing us down. We need either a way to
      turn of FVD or to improve the performance of that path.

      The performance of inputObject() itself also seems to have regressed,
      but I think that's because inputObjectUsingFVD() calls it; once we get
      recursive calls, it's hard to tell from the profile.

        Issue Links

          Activity

          Hide
          Ken Cavanaugh added a comment -

          I think the main reason the FVD is slower is because of the sender.meta calls in getOrderedDescriptions.
          This is a remote callback to the client, and that alone will prevent any significant performance optimization.
          I don't think caching is easy here either, because essentially we would need to cache
          (repositoryID X CodeBase) -> List<FullValueDescription>, and the CodeBase is a CORBA object reference,
          which has very weak equals semantics (we can't always decide whether or not two remote references are
          equals, through perhaps in this case I could force that). We also have the problem that sender
          is a transient object, and will change for every client and every re-start of a client. And restarting
          a client is an opportunity for the remote type to change.

          I think the real question is why are we using the FVD now? This is determined by
          ValueHandlerImpl.useFullValueDescription( Class cls, String repoId ), which determines the
          repoId of cls, and if it is different from repoId, returns true. The repoId contains an
          RMI hash value, and the serialVersionUID of the class. If either of these are different,
          it will force the use of the FVD. So what changed between 3.0 and 3.1?

          One small possibility: if the class is an enum, bug 6877056 required a change to the suid: it is
          now 0 instead of whatever it was previously.

          Some questions:

          • How many times per request is sender.meta called?
          • Are there any enums in the test?
          • Which benchmark does this affect, and by how much?

          I can't fix this for RC2, and RC3 may be difficult as well.

          Show
          Ken Cavanaugh added a comment - I think the main reason the FVD is slower is because of the sender.meta calls in getOrderedDescriptions. This is a remote callback to the client, and that alone will prevent any significant performance optimization. I don't think caching is easy here either, because essentially we would need to cache (repositoryID X CodeBase) -> List<FullValueDescription>, and the CodeBase is a CORBA object reference, which has very weak equals semantics (we can't always decide whether or not two remote references are equals, through perhaps in this case I could force that). We also have the problem that sender is a transient object, and will change for every client and every re-start of a client. And restarting a client is an opportunity for the remote type to change. I think the real question is why are we using the FVD now? This is determined by ValueHandlerImpl.useFullValueDescription( Class cls, String repoId ), which determines the repoId of cls, and if it is different from repoId, returns true. The repoId contains an RMI hash value, and the serialVersionUID of the class. If either of these are different, it will force the use of the FVD. So what changed between 3.0 and 3.1? One small possibility: if the class is an enum, bug 6877056 required a change to the suid: it is now 0 instead of whatever it was previously. Some questions: How many times per request is sender.meta called? Are there any enums in the test? Which benchmark does this affect, and by how much? I can't fix this for RC2, and RC3 may be difficult as well.
          Hide
          Nazrul added a comment -

          This affects SpecJ benchmark (part of the exit criteria).

          Show
          Nazrul added a comment - This affects SpecJ benchmark (part of the exit criteria).
          Hide
          Ken Cavanaugh added a comment -

          If this is needed for RC 3, I need more information as soon as possible.
          If we cannot avoid the use of the FVD, it will probably not be possible to fix for RC 3.

          Show
          Ken Cavanaugh added a comment - If this is needed for RC 3, I need more information as soon as possible. If we cannot avoid the use of the FVD, it will probably not be possible to fix for RC 3.
          Hide
          Scott Oaks added a comment -

          The class that is being treated as FVD is org.spec.jent.ejb.mfg.entity.WorkOrder. In RepositoryID.useFullValueDescription() I printed out the clazzRepIDStr and repositoryID: RMI:org.spec.jent.ejb.mfg.entity.WorkOrder:2278B9E1654056DF:DB5326CCDA5E594A
          RMI:org.spec.jent.ejb.mfg.entity.WorkOrder:0D0763E46FEF1DB9:DB5326CCDA5E594A

          Where do the repository IDs come from? The client is loading that class from the same jar file that was deployed to the server (and there are several other classes in that jar file that I see in the useFullValueDescription() method and the repository IDs match; it is just this one particular class). On the other hand, in the server that class goes through JPA injection (the only transferred class that does that); maybe that seems to change the class definition? That also happened in V3, though I guess the injection could be different.

          Show
          Scott Oaks added a comment - The class that is being treated as FVD is org.spec.jent.ejb.mfg.entity.WorkOrder. In RepositoryID.useFullValueDescription() I printed out the clazzRepIDStr and repositoryID: RMI:org.spec.jent.ejb.mfg.entity.WorkOrder:2278B9E1654056DF:DB5326CCDA5E594A RMI:org.spec.jent.ejb.mfg.entity.WorkOrder:0D0763E46FEF1DB9:DB5326CCDA5E594A Where do the repository IDs come from? The client is loading that class from the same jar file that was deployed to the server (and there are several other classes in that jar file that I see in the useFullValueDescription() method and the repository IDs match; it is just this one particular class). On the other hand, in the server that class goes through JPA injection (the only transferred class that does that); maybe that seems to change the class definition? That also happened in V3, though I guess the injection could be different.
          Hide
          Ken Cavanaugh added a comment -

          The repository ID is of the form RMI:<class name>:<hash code>:<suid>. Here we are
          only interested in the <hash code> differences.

          The hash code is computed as follows (from the CORBA 3.1 spec section 14.7.2):

          -----------------------------------------------------

          An RMI Hashed RepositoryId consists of either three or four components, separated by colons:

          RMI: <class name> : <hash code> [ : <serialization version UID> ]

          The class name is a Java class name as returned by the getName method of java.lang.Class. Any characters not in
          ISO Latin 1 are replaced by “\U” followed by the 4 hexadecimal characters (in upper case) representing the Unicode
          value.

          For classes that do not implement java.io.Serializable, and for interfaces, the hash code is always zero, and the
          RepositoryID does not contain a serial version UID.

          For classes that implement java.io.Externalizable, the hash code is always the 64-bit value 1.

          For classes that implement java.io.Serializable but not java.io.Externalizable, the hash code is a 64-
          bit hash of a stream of bytes (transcribed as a 16-digit upper case hex string). An instance of
          java.lang.DataOutputStream is used to convert primitive data types to a sequence of bytes. The sequence of
          items in the stream is as follows:

          1. The hash code of the superclass, written as a 64-bit long.

          2. The value 1 if the class has no writeObject method, or the value 2 if the class has a writeObject method,
          written as a 32-bit integer.

          3. For each field of the class that is mapped to IDL, sorted lexicographically by Java field name, in increasing order:
          a. Java field name, in UTF encoding
          b. field descriptor, as defined by the Java Virtual Machine Specification, in UTF encoding.

          The National Institute of Standards and Technology (NIST) Secure Hash Algorithm (SHA-1) is executed on the stream of
          bytes produced by DataOutputStream, producing a 20 byte array of values, sha[0..19]. The hash code is assembled
          from the first 8 bytes of this array as follows:

          long hash = 0;
          for (int i = 0; i < Math.min(8, sha.length); i++)

          { hash += (long)(sha[i] & 255) << (i * 8); }

          For Serializable (including Externalizable) classes, the Java serialization version UID, transcribed as a 16 digit upper-case
          hex string, shall be appended to the RepositoryId following the hash code and a colon. The Java serialization version
          UID is defined in the Java Object Serialization Specification.

          -------------------------------------------------

          This is computed in the ObjectStreamClass.computeStructuralUID method.
          There are changes that can be made to a class that change the hash code, but not the suid.
          For example, the presence or absence of a writeObject method affects the hash code, but not
          the suid. Obviously any change in serializable fields would also affect the computations.
          Does the WorkOrder class explicitly define its serialVersionUID?

          Not sure if the injection changed, perhaps Mitesh might know?

          Show
          Ken Cavanaugh added a comment - The repository ID is of the form RMI:<class name>:<hash code>:<suid>. Here we are only interested in the <hash code> differences. The hash code is computed as follows (from the CORBA 3.1 spec section 14.7.2): ----------------------------------------------------- An RMI Hashed RepositoryId consists of either three or four components, separated by colons: RMI: <class name> : <hash code> [ : <serialization version UID> ] The class name is a Java class name as returned by the getName method of java.lang.Class. Any characters not in ISO Latin 1 are replaced by “\U” followed by the 4 hexadecimal characters (in upper case) representing the Unicode value. For classes that do not implement java.io.Serializable, and for interfaces, the hash code is always zero, and the RepositoryID does not contain a serial version UID. For classes that implement java.io.Externalizable, the hash code is always the 64-bit value 1. For classes that implement java.io.Serializable but not java.io.Externalizable, the hash code is a 64- bit hash of a stream of bytes (transcribed as a 16-digit upper case hex string). An instance of java.lang.DataOutputStream is used to convert primitive data types to a sequence of bytes. The sequence of items in the stream is as follows: 1. The hash code of the superclass, written as a 64-bit long. 2. The value 1 if the class has no writeObject method, or the value 2 if the class has a writeObject method, written as a 32-bit integer. 3. For each field of the class that is mapped to IDL, sorted lexicographically by Java field name, in increasing order: a. Java field name, in UTF encoding b. field descriptor, as defined by the Java Virtual Machine Specification, in UTF encoding. The National Institute of Standards and Technology (NIST) Secure Hash Algorithm (SHA-1) is executed on the stream of bytes produced by DataOutputStream, producing a 20 byte array of values, sha [0..19] . The hash code is assembled from the first 8 bytes of this array as follows: long hash = 0; for (int i = 0; i < Math.min(8, sha.length); i++) { hash += (long)(sha[i] & 255) << (i * 8); } For Serializable (including Externalizable) classes, the Java serialization version UID, transcribed as a 16 digit upper-case hex string, shall be appended to the RepositoryId following the hash code and a colon. The Java serialization version UID is defined in the Java Object Serialization Specification. ------------------------------------------------- This is computed in the ObjectStreamClass.computeStructuralUID method. There are changes that can be made to a class that change the hash code, but not the suid. For example, the presence or absence of a writeObject method affects the hash code, but not the suid. Obviously any change in serializable fields would also affect the computations. Does the WorkOrder class explicitly define its serialVersionUID? Not sure if the injection changed, perhaps Mitesh might know?
          Hide
          Nazrul added a comment -

          Based on latest analysis, this issue is happening due to the new JPA weaving feature. So, assigning to Mitesh.

          Show
          Nazrul added a comment - Based on latest analysis, this issue is happening due to the new JPA weaving feature. So, assigning to Mitesh.
          Hide
          Scott Oaks added a comment -

          The difference in client and server side classes is that the server-side classes are subject to weaving to support new JPA features; the weaving adds new non-transient fields to the class definition.

          As we do not (presently) need these features, we are able to turn off the weaving and get the class definitions to match. This gets rid of the client-side performance issue with simpleReadObject(). This may not be viable for other applications which need to use weaving features in a CORBA environment. Theoretically, the client side class could also be woven in that case, except that the necessary classes to do that do not ship with glassfish.

          Show
          Scott Oaks added a comment - The difference in client and server side classes is that the server-side classes are subject to weaving to support new JPA features; the weaving adds new non-transient fields to the class definition. As we do not (presently) need these features, we are able to turn off the weaving and get the class definitions to match. This gets rid of the client-side performance issue with simpleReadObject(). This may not be viable for other applications which need to use weaving features in a CORBA environment. Theoretically, the client side class could also be woven in that case, except that the necessary classes to do that do not ship with glassfish.
          Hide
          Scott Oaks added a comment -

          Using new properties in persistence.xml to turn off weaving.

          Show
          Scott Oaks added a comment - Using new properties in persistence.xml to turn off weaving.

            People

            • Assignee:
              Mitesh Meswani
              Reporter:
              Scott Oaks
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: