[SERVLET_SPEC-67] Add support for obtaining path parameter information from HttpServletRequest Created: 11/Mar/13  Updated: 21/Aug/14

Status: Open
Project: servlet-spec
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major
Reporter: Nick Williams Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: None
Remaining Estimate: 3 hours
Time Spent: Not Specified
Original Estimate: 3 hours
Environment:

n/a


Tags: matrixparameter, pathparameter, pathsegment, request

 Description   

URI paths can contain parameters, independent of the query string. Parameters belong to the path segment within which they appear and to the URI as a whole. Parameters are separated from the segment and from each other with a semicolon (;), and parameter values are separated from each other with a comma (,). For example, consider the following URL:

http://www.example.org/foo;x=1;y=2/bar;a=3,4;y=5

In this example, x is 1 and y is 2 or the /foo segment, while a is [3, 4] and y is 5 for the /bar segment. For the entire URL, a appears once and has 2 values, x appears once and has 1 value, and y appears twice and has 1 value and 1 value.

The servlet spec already recognizes path parameters, though it does not actually provide an interface for extracting them. As an example of this, if application /foo is deployed at example.org and it has a Servlet mapped to /bar, the aforementioned URL will match that context and servlet. A call to HttpServletRequest#getContextPath() will return /foo, not /foo;x=1;y=2, and a call to HttpServletRequest#getServletPath() will return /bar, not /bar;a=3,4;y=5.

This suggestion is to add two methods to HttpServletRequest:

...
    /**
     * Returns all of the path (matrix) parameters that appear in the request URI. The keys in the
     * map are the parameter names. The map values are lists of entries. If a parameter appears
     * in one path segment, there will be one value in the list, and that value may be one or more
     * strings. If a parameter appears in multiple path segments, there will be a value in the list
     * for each path segment, in the order the path segments appear in the URI. Each value may
     * be one or more strings.
     * <p>
     * Path parameters are separated from their segments and the ... [explanation from above]
     *
     * @return the parameters present in all path segments in the URI.
     */
    Map<String, List<String[]>> getPathParameters(); // could be Map<String, List<List<String>>> instead
...
    /**
     * Returns a list of all path segments in the request URI. Path segments are separated by the
     * forward slash (/). The path segments returned by this method will include the context
     * path and the Servlet path.
     *
     * @return a list of all the path segments in the request URI, in the order they appear.
     */
    List<PathSegment> getPathSegments();
...

A call to either getPathParameters or getPathSegments results in the processing and caching of all path parameters. This is independent of the processing and caching of request parameters (getParameter, getParameterNames, etc.). The processing of path parameters should not trigger the processing of request parameters, and vice versa. If easier/more efficient, the container may process path parameters when it decodes the URI (note that parameter processing should be performed against the URI before decoding, but parameter names and values should be decoded).

(Importantly, if I call getPathParameters or getPathSegments within a filter, it should not block while POST parameters or multipart data (unrelated) are processed.)

The new javax.servlet.http.PathSegment interface is modeled off of the javax.ws.rs.core.PathSegment interface, which exists for the same purpose:

package javax.servlet.http;

public interface PathSegment
{
    /**
     * Returns the path for this specific segment, including the leading forward slash (/).
     *
     * @return the path for this segment.
     */
    String getPath();

    /**
     * Returns the path (matrix) parameters that appear in this segment. The keys in
     * the map are the parameter names. The values are all of the values assigned to
     * the corresponding parameters. A parameter may have one or more values.
     * <p>
     * Path parameters are separated from their segments and the ... [explanation from above]
     */
    Map<String, String[]> getParameters(); // could be Map<String, List<String>> instead
}

There is currently a workaround to accomplishing this, though it has its disadvantages. Parameters could simply be processed as-needed by the application using its own or third-party code. Or a filter could be written to process parameters and add them to the request as a request attribute. The key problem with both of these approaches is that the container knows what character encoding was used for the URI, but the application does not. It would be more accurate and reliable for the container to perform the parameter processing.

For the most information, I have included parts a sample filter below that I created for use in my application. Some of the code (namely the POJOs) is inferred.

...
    @Override
    public void doFilter(ServletRequest request, ServletResponse response,
                         FilterChain chain) throws IOException, ServletException
    {
        String[] paths = ((HttpServletRequest)request).getRequestURI()
                .substring(1).split("/");
        PathInfo info = new PathInfo();

        for(String path : paths)
        {
            String[] parts = path.split(";");
            PathSegment segment = new PathSegment();
            segment.path = parts[0];
            for(int i = 1; i < parts.length; i++)
            {
                String[] p = parts[i].split("=", 2);
                String key = decode(p[0]);
                if(p.length == 2)
                    segment.parameters.put(key, decode(p[1].split(",", -1)));
                else
                    segment.parameters.put(key, new String[] {""});
                if(!info.parameters.containsKey(key))
                    info.parameters.put(key, new ArrayList<>());
                info.parameters.get(key).add(segment.parameters.get(key));
            }
            info.segments.add(segment);
        }

        request.setAttribute("com.wrox.pathInfo", info);

        chain.doFilter(request, response);
    }

    private String decode(String original)
    {
        try {
            return URLDecoder.decode(original, "UTF-8");
        } catch (UnsupportedEncodingException e) {
            throw new RuntimeException(e); // not possible
        }
    }

    private String[] decode(String[] original)
    {
        String[] newValues = new String[original.length];
        for(int i = 0; i < original.length; i++)
        {
            try {
                newValues[i] = URLDecoder.decode(original[i], "UTF-8");
            } catch (UnsupportedEncodingException e) {
                throw new RuntimeException(e); // not possible
            }
        }
        return newValues;
    }
...

Estimate 30 minutes to add the relevant methods/interfaces and 2.5 hours to update the spec doc.



 Comments   
Comment by Nick Williams [ 11/Mar/13 ]

Also, I believe the spec should specify the following important notes:

  • Containers should preserve empty parameter values. So, if a parameter exists where x=1,,2,3,,,4,, the resulting values should be ["1", "", "2", "3", "", "", "4", ""]. If x=, then the resulting values should be [""].
  • Users should be warned that browsers do not recognize or interpret path parameters, and as a result they can interfere with cookies. If a cookie is set to path /foo, requests to /foo/bar will include the cookie but requests to /foo;a=1/bar will not include the cookie. However, requests to /foo/bar;a=1 will include the cookie, since the path parameters are not interfering with the cookie path in this case.
Comment by Nick Williams [ 11/Mar/13 ]

If this can make it in Servlet 3.1, great. If not, no big deal. There is a workaround, so it is not crucial that this be in 3.1.

Comment by rstoyanchev [ 25/Apr/13 ]

While the above understanding of path parameters is correct, note that it represents one of several styles of path parameters. RFC 3986 (section 3.3) is relatively vague and leaves a lot of room:

For example, the semicolon (";") and equals ("=") reserved characters are
often used to delimit parameters and parameter values applicable to
that segment.  The comma (",") reserved character is often used for
similar purposes.  For example, one URI producer might use a segment
such as "name;v=1.1" to indicate a reference to version 1.1 of
"name", whereas another might use a segment such as "name,1.1" to
indicate the same.

This probably reflects the fact that a few different styles of path parameters have evolved over time in the absence of a very precise definition. In addition to the above examples, here is one other example from the StackExchange API where a path segment contains a ";" separated list of ids (the ";" in this case is merely a separator):

http://api.stackoverflow.com/1.1/usage/methods/comments-by-ids

Generated at Wed Mar 04 01:26:01 UTC 2015 using JIRA 6.2.3#6260-sha1:63ef1d6dac3f4f4d7db4c1effd405ba38ccdc558.