jsr-333
  1. jsr-333
  2. JSR_333-53

SQL-2: clarify syntax of names and paths

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: spec
    • Labels:
      None

      Description

      I would like to clear up the SQL-2 syntax for names and paths. From the JCR 2.0 spec:

      6.7.4 Name
      Name ::= '[' quotedName ']' | '[' simpleName ']' | simpleName
      quotedName ::= /* A JCR Name */
      simpleName ::= /* A JCR Name that is also a legal SQL identifier */

      6.7.23 Path
      Path ::= '[' quotedPath ']' | '[' simplePath ']' | simplePath
      quotedPath ::= /* A JCR Path that contains non-SQL-legal characters */
      simplePath ::= /* A JCR Name that contains only SQL-legal characters */

      See the SQL:92 rules for <regular identifier> (in ISO/IEC 9075:1992 §5.2 <token> and <separator>).

      I guess (not sure) that "quotedName" / "quotedPath" does not actually mean a name or path surrounded with single quotes, that means it wouldn't be the idea that the path /test/node can be surrounded in single quotes as in ['/test/node']. I guess the idea is to support bracketed identifiers that are delimited by brackets ([ ]) as in MS SQL Server:

      http://msdn.microsoft.com/en-us/library/aa224033(v=sql.80).aspx
      http://msdn.microsoft.com/en-us/library/ms176027(v=sql.90).aspx

      That would mean spaces and special characters are preserved bracketed using [ ].

      Not fully clear is how the characters [ and ] should be handled within bracketed identifiers. MS SQL Server requires doubling ] (so that [Employee]]] is the quoted version of Employee] ). That would mean a path test[1]/node would result in [test[1]]/node] (where [ is not doubled, only ]).

      See also https://issues.apache.org/jira/browse/OAK-295

        Activity

        Hide
        Peeter Piegaze added a comment -

        @Thomas: How would you fix/clarify the BNF?

        Show
        Peeter Piegaze added a comment - @Thomas: How would you fix/clarify the BNF?
        Hide
        thomasmueller2 added a comment -

        As for identifiers quoted with [ and ], I would follow the MS SQL Server style, so that to escape a ] characters, doubling is required: [abc:[def]]] would refer to the node type "abc:[def]". (I'm not sure if such node types are allowed, this example is just to explain how to escape / de-escape).

        As for changes to the spec, I would write the following. Also, it might make sense to show an example in each case, for illustration:

        6.7.4 Name
        Name ::= '[' quotedName ']' | simpleName
        quotedName ::= /* A JCR Name, where the ']' character is escaped as ']]'. */
        simpleName ::= /* A JCR Name that is also a legal SQL identifier */

        Example names are: test, [test], [nt:base], [abc.[def]]]. The last example is the quoted form of the name "abc.[def]".

        6.7.23 Path
        Path ::= '[' quotedPath ']' | simplePath
        quotedPath ::= /* A JCR Path that may contain non-SQL-legal characters, and the ']' character is escaped as ']]' */
        simplePath ::= /* A JCR Name that contains only SQL-legal characters */

        Example paths are: blog, [blog/Hello World], [blog/[1]]]. The last example is the quoted form of the path "blog/[1]".

        Regards,
        Thomas

        Show
        thomasmueller2 added a comment - As for identifiers quoted with [ and ], I would follow the MS SQL Server style, so that to escape a ] characters, doubling is required: [abc: [def] ]] would refer to the node type "abc: [def] ". (I'm not sure if such node types are allowed, this example is just to explain how to escape / de-escape). As for changes to the spec, I would write the following. Also, it might make sense to show an example in each case, for illustration: 6.7.4 Name Name ::= ' [' quotedName '] ' | simpleName quotedName ::= /* A JCR Name, where the ']' character is escaped as ']]'. */ simpleName ::= /* A JCR Name that is also a legal SQL identifier */ Example names are: test, [test] , [nt:base] , [abc. [def] ]]. The last example is the quoted form of the name "abc. [def] ". 6.7.23 Path Path ::= ' [' quotedPath '] ' | simplePath quotedPath ::= /* A JCR Path that may contain non-SQL-legal characters, and the ']' character is escaped as ']]' */ simplePath ::= /* A JCR Name that contains only SQL-legal characters */ Example paths are: blog, [blog/Hello World] , [blog/ [1] ]]. The last example is the quoted form of the path "blog/ [1] ". Regards, Thomas
        Hide
        Peeter Piegaze added a comment - - edited

        Ok, I will change the grammar as suggested. But, just for the sake of clarity, I think I understand how this trouble came about:

        1. The terminology "quotedName" etc is confusing, I grant you that. It was not meant to indicate a string already in quotes, but rather a string that needs quotes (or some type of delimiters, due to it containing otherwise invalid chars). However, the pattern of using "quoted" in this way in the spec is widespread and I don't hink it is worth fixing at this point.

        2. The delimiters '[' and ']' were chosen precisely because they are not valid characters in a JCR Name. Therefore there should be no reason to worry about escaping them (but see next point).

        3. When we selected [ and ] as delimiters we made a mistake! because at the same time (JCR 2.0) we also introduced the distinction between expanded and qualified forms of JCR names. Pre-2.0 it was true that [ and ] were invalid within a JCR Name (what we now call qualified form, with the prefix, e.g., jcr:content etc.)

        However, as of 2.0 we allowed JCR Names to be written in expanded form which includes an explicit URI, like this

        {http://foo.com/bar}

        blah. The trouble is that now it is possible to have a [ or ] in a JCR name because those characters are valid within a URI.

        So, it turns out you are right and I will fix the grammar. Just thought you'd like to know what the background was.

        Show
        Peeter Piegaze added a comment - - edited Ok, I will change the grammar as suggested. But, just for the sake of clarity, I think I understand how this trouble came about: 1. The terminology "quotedName" etc is confusing, I grant you that. It was not meant to indicate a string already in quotes, but rather a string that needs quotes (or some type of delimiters, due to it containing otherwise invalid chars). However, the pattern of using "quoted" in this way in the spec is widespread and I don't hink it is worth fixing at this point. 2. The delimiters ' [' and '] ' were chosen precisely because they are not valid characters in a JCR Name. Therefore there should be no reason to worry about escaping them (but see next point). 3. When we selected [ and ] as delimiters we made a mistake! because at the same time (JCR 2.0) we also introduced the distinction between expanded and qualified forms of JCR names. Pre-2.0 it was true that [ and ] were invalid within a JCR Name (what we now call qualified form, with the prefix, e.g., jcr:content etc.) However, as of 2.0 we allowed JCR Names to be written in expanded form which includes an explicit URI, like this {http://foo.com/bar} blah. The trouble is that now it is possible to have a [ or ] in a JCR name because those characters are valid within a URI. So, it turns out you are right and I will fix the grammar. Just thought you'd like to know what the background was.
        Hide
        Peeter Piegaze added a comment -

        Fixed.

        Show
        Peeter Piegaze added a comment - Fixed.
        Hide
        thomasmueller2 added a comment -

        > The terminology "quotedName" etc is confusing

        I think there is no need to change the term.

        The problem really should have been found when implementing the parser in Jackrabbit, and I did that, and I didn't see the problem back then. So really it's my mistake... But it can only be fixed in the spec, and it's good that this can be done now.

        Thanks!

        Show
        thomasmueller2 added a comment - > The terminology "quotedName" etc is confusing I think there is no need to change the term. The problem really should have been found when implementing the parser in Jackrabbit, and I did that, and I didn't see the problem back then. So really it's my mistake... But it can only be fixed in the spec, and it's good that this can be done now. Thanks!

          People

          • Assignee:
            Unassigned
            Reporter:
            thomasmueller2
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved: