servlet-spec
  1. servlet-spec
  2. SERVLET_SPEC-67

Add support for obtaining path parameter information from HttpServletRequest

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Labels:
      None
    • Environment:

      n/a

      Description

      URI paths can contain parameters, independent of the query string. Parameters belong to the path segment within which they appear and to the URI as a whole. Parameters are separated from the segment and from each other with a semicolon (;), and parameter values are separated from each other with a comma (,). For example, consider the following URL:

      http://www.example.org/foo;x=1;y=2/bar;a=3,4;y=5

      In this example, x is 1 and y is 2 or the /foo segment, while a is [3, 4] and y is 5 for the /bar segment. For the entire URL, a appears once and has 2 values, x appears once and has 1 value, and y appears twice and has 1 value and 1 value.

      The servlet spec already recognizes path parameters, though it does not actually provide an interface for extracting them. As an example of this, if application /foo is deployed at example.org and it has a Servlet mapped to /bar, the aforementioned URL will match that context and servlet. A call to HttpServletRequest#getContextPath() will return /foo, not /foo;x=1;y=2, and a call to HttpServletRequest#getServletPath() will return /bar, not /bar;a=3,4;y=5.

      This suggestion is to add two methods to HttpServletRequest:

      ...
          /**
           * Returns all of the path (matrix) parameters that appear in the request URI. The keys in the
           * map are the parameter names. The map values are lists of entries. If a parameter appears
           * in one path segment, there will be one value in the list, and that value may be one or more
           * strings. If a parameter appears in multiple path segments, there will be a value in the list
           * for each path segment, in the order the path segments appear in the URI. Each value may
           * be one or more strings.
           * <p>
           * Path parameters are separated from their segments and the ... [explanation from above]
           *
           * @return the parameters present in all path segments in the URI.
           */
          Map<String, List<String[]>> getPathParameters(); // could be Map<String, List<List<String>>> instead
      ...
          /**
           * Returns a list of all path segments in the request URI. Path segments are separated by the
           * forward slash (/). The path segments returned by this method will include the context
           * path and the Servlet path.
           *
           * @return a list of all the path segments in the request URI, in the order they appear.
           */
          List<PathSegment> getPathSegments();
      ...
      

      A call to either getPathParameters or getPathSegments results in the processing and caching of all path parameters. This is independent of the processing and caching of request parameters (getParameter, getParameterNames, etc.). The processing of path parameters should not trigger the processing of request parameters, and vice versa. If easier/more efficient, the container may process path parameters when it decodes the URI (note that parameter processing should be performed against the URI before decoding, but parameter names and values should be decoded).

      (Importantly, if I call getPathParameters or getPathSegments within a filter, it should not block while POST parameters or multipart data (unrelated) are processed.)

      The new javax.servlet.http.PathSegment interface is modeled off of the javax.ws.rs.core.PathSegment interface, which exists for the same purpose:

      package javax.servlet.http;
      
      public interface PathSegment
      {
          /**
           * Returns the path for this specific segment, including the leading forward slash (/).
           *
           * @return the path for this segment.
           */
          String getPath();
      
          /**
           * Returns the path (matrix) parameters that appear in this segment. The keys in
           * the map are the parameter names. The values are all of the values assigned to
           * the corresponding parameters. A parameter may have one or more values.
           * <p>
           * Path parameters are separated from their segments and the ... [explanation from above]
           */
          Map<String, String[]> getParameters(); // could be Map<String, List<String>> instead
      }
      

      There is currently a workaround to accomplishing this, though it has its disadvantages. Parameters could simply be processed as-needed by the application using its own or third-party code. Or a filter could be written to process parameters and add them to the request as a request attribute. The key problem with both of these approaches is that the container knows what character encoding was used for the URI, but the application does not. It would be more accurate and reliable for the container to perform the parameter processing.

      For the most information, I have included parts a sample filter below that I created for use in my application. Some of the code (namely the POJOs) is inferred.

      ...
          @Override
          public void doFilter(ServletRequest request, ServletResponse response,
                               FilterChain chain) throws IOException, ServletException
          {
              String[] paths = ((HttpServletRequest)request).getRequestURI()
                      .substring(1).split("/");
              PathInfo info = new PathInfo();
      
              for(String path : paths)
              {
                  String[] parts = path.split(";");
                  PathSegment segment = new PathSegment();
                  segment.path = parts[0];
                  for(int i = 1; i < parts.length; i++)
                  {
                      String[] p = parts[i].split("=", 2);
                      String key = decode(p[0]);
                      if(p.length == 2)
                          segment.parameters.put(key, decode(p[1].split(",", -1)));
                      else
                          segment.parameters.put(key, new String[] {""});
                      if(!info.parameters.containsKey(key))
                          info.parameters.put(key, new ArrayList<>());
                      info.parameters.get(key).add(segment.parameters.get(key));
                  }
                  info.segments.add(segment);
              }
      
              request.setAttribute("com.wrox.pathInfo", info);
      
              chain.doFilter(request, response);
          }
      
          private String decode(String original)
          {
              try {
                  return URLDecoder.decode(original, "UTF-8");
              } catch (UnsupportedEncodingException e) {
                  throw new RuntimeException(e); // not possible
              }
          }
      
          private String[] decode(String[] original)
          {
              String[] newValues = new String[original.length];
              for(int i = 0; i < original.length; i++)
              {
                  try {
                      newValues[i] = URLDecoder.decode(original[i], "UTF-8");
                  } catch (UnsupportedEncodingException e) {
                      throw new RuntimeException(e); // not possible
                  }
              }
              return newValues;
          }
      ...
      

      Estimate 30 minutes to add the relevant methods/interfaces and 2.5 hours to update the spec doc.

        Activity

        Hide
        rstoyanchev added a comment -

        While the above understanding of path parameters is correct, note that it represents one of several styles of path parameters. RFC 3986 (section 3.3) is relatively vague and leaves a lot of room:

        For example, the semicolon (";") and equals ("=") reserved characters are
        often used to delimit parameters and parameter values applicable to
        that segment.  The comma (",") reserved character is often used for
        similar purposes.  For example, one URI producer might use a segment
        such as "name;v=1.1" to indicate a reference to version 1.1 of
        "name", whereas another might use a segment such as "name,1.1" to
        indicate the same.
        

        This probably reflects the fact that a few different styles of path parameters have evolved over time in the absence of a very precise definition. In addition to the above examples, here is one other example from the StackExchange API where a path segment contains a ";" separated list of ids (the ";" in this case is merely a separator):

        http://api.stackoverflow.com/1.1/usage/methods/comments-by-ids

        Show
        rstoyanchev added a comment - While the above understanding of path parameters is correct, note that it represents one of several styles of path parameters. RFC 3986 (section 3.3) is relatively vague and leaves a lot of room: For example, the semicolon (";") and equals ("=") reserved characters are often used to delimit parameters and parameter values applicable to that segment. The comma (",") reserved character is often used for similar purposes. For example, one URI producer might use a segment such as "name;v=1.1" to indicate a reference to version 1.1 of "name", whereas another might use a segment such as "name,1.1" to indicate the same. This probably reflects the fact that a few different styles of path parameters have evolved over time in the absence of a very precise definition. In addition to the above examples, here is one other example from the StackExchange API where a path segment contains a ";" separated list of ids (the ";" in this case is merely a separator): http://api.stackoverflow.com/1.1/usage/methods/comments-by-ids
        Hide
        Nick Williams added a comment -

        If this can make it in Servlet 3.1, great. If not, no big deal. There is a workaround, so it is not crucial that this be in 3.1.

        Show
        Nick Williams added a comment - If this can make it in Servlet 3.1, great. If not, no big deal. There is a workaround, so it is not crucial that this be in 3.1.
        Hide
        Nick Williams added a comment -

        Also, I believe the spec should specify the following important notes:

        • Containers should preserve empty parameter values. So, if a parameter exists where x=1,,2,3,,,4,, the resulting values should be ["1", "", "2", "3", "", "", "4", ""]. If x=, then the resulting values should be [""].
        • Users should be warned that browsers do not recognize or interpret path parameters, and as a result they can interfere with cookies. If a cookie is set to path /foo, requests to /foo/bar will include the cookie but requests to /foo;a=1/bar will not include the cookie. However, requests to /foo/bar;a=1 will include the cookie, since the path parameters are not interfering with the cookie path in this case.
        Show
        Nick Williams added a comment - Also, I believe the spec should specify the following important notes: Containers should preserve empty parameter values. So, if a parameter exists where x=1,,2,3,,,4, , the resulting values should be ["1", "", "2", "3", "", "", "4", ""] . If x= , then the resulting values should be [""] . Users should be warned that browsers do not recognize or interpret path parameters, and as a result they can interfere with cookies. If a cookie is set to path /foo , requests to /foo/bar will include the cookie but requests to /foo;a=1/bar will not include the cookie. However, requests to /foo/bar;a=1 will include the cookie, since the path parameters are not interfering with the cookie path in this case.

          People

          • Assignee:
            Unassigned
            Reporter:
            Nick Williams
          • Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Time Tracking

              Estimated:
              Original Estimate - 3 hours
              3h
              Remaining:
              Remaining Estimate - 3 hours
              3h
              Logged:
              Time Spent - Not Specified
              Not Specified