Tuesday, February 03, 2009

REST: RESTful Shopping Carts

I've been thinking about the book RESTful Web Services. It has lots of negative things to say about cookies, for instance, "OK, so cookies shouldn't contain session IDs" [p. 252]. Elsewhere in the book, it describes a scheme using temporary URLs for transactions [p. 231].

I was thinking about shopping carts. If you can't use a cookie to store a session ID, then it seems natural to embed the session ID into the URL. (I'm thinking about the case when the user hasn't even logged in yet.) However, therein lies the problem. If you put the session ID in the URL, you open yourself up to well-known session fixation attacks.

Let me explain. Attacker A creates a shopping cart on a legitimate Web site that embeds the session ID (or some other sort of state) in the URL. Attacker A spams a bunch of people. A victim V clicks on the link. He knows that the site is legitimate. He adds a few things to his cart, logs in, and places the order. At this point, the attacker has the user's session ID. He can do whatever he wants with the user's account.

Perhaps I'm completely making this up. Part of being RESTful is being stateless. However, people like to have shopping carts, so some sort of state is necessary as is some way of tying a user (who isn't logged in) to that state. Perhaps there is indeed a way to maintain a shopping cart without requiring the user to log in and without using a cookie. However, naively shoving a session ID into the URL ain't it.

19 comments:

kioopi said...

If I understood the book right, any solution involving a session would keep state in the service.

I think a way of avoiding that would be to either keep the state completely at the clients or create a resource for the shopping cart.

The first way could imho involve keeping the entire shopping cart coded in a cookie. It could also mean coding the cart's content into the URL and dragging it along.

Creating a resource for the cart would move the state from the service into the backend. Obviously this would make some sort of authentication necessary to keep people from hijacking or filling other people’s carts.

Shannon -jj Behrens said...

Thanks for the intelligent comment.

Clearly, keeping the entire shopping cart in either the cookie or the URL is a no-go because both are extremely limited. It'd suck if you ran out of room in your cookie to add another thing to your cart ;)

The book does make a distinction between "application state" and "session state". It would probably argue that a shopping cart is "application state". However, that doesn't answer the question of how to tie this viewer to his shopping cart.

The "some sort of authentication" you mention is definitely the question. People these days expect to be able to go to an ecommerce site and start adding things to their cart without even creating an account. Often, they create an account as part of the checkout process.

kioopi said...

Yes, Cookies larger than 4k are not guaranteed by all browsers and really long URLs seem to bear their own difficulties, too (http://www.boutell.com/newfaq/misc/urllength.html).

So I guess, it’s all a give-and-take. Just implement it with sessions and session-ids and get the pain of having to scale the session-store up in case of high demand? Require Gears? Wait for HTML5 client-side-storage?

Maybe some compromise like creating a backend-resource with an URL and limiting access by client IP could work. The carts could still get compromised but less likely. The trade-off would be that the customer has to double-check their order while checkout. Also the IP could be changed by an ISP and the customer would lose their cart. That’s ugly.

Not requiring to login in order to fill the cart is just a tricky thing. RESTful or not. No matter what you’ll have some drawbacks.

But then again, real supermarkets have to take care of left-behind carts as well.

I wonder how Amazon does it. Is it just regular sessions on a massive backend?

LoveEncounterFlow said...

i’ve said it before, and i’ll say it again, 10³ times if necessary: if RESTful means you go from GET and POST to GET, POST, PUT, DELETE---then it’s a bad idea.

RESTful thinking is a great tool to clear up in your mind essential principles of the way the web works. GET, POST, PUT, DELETE are not so hot. GET and POST are great, they are what is actually implemented in browsers, and they’re sufficient. to put CRUD (Create, read, update and delete) into the HTTP method is bad.

any semantics that go beyond the ‘underlying request modalities’ of HTTP communication do NOT belong into the HTTP method, they belong into the URL. One very obvious piece of evidence to corroborate this is the fact that in a RESTful app, you cannot specify both your object and your intent in one URL. this is like sending an envelope with a card to the stock keeper, write DELETE along with his address on the outside and "product/42" on the card. Nonsense. You’re supposed to write him a card reading "my-storekeeper.biz / delete / product / 42" and scribble a red circled POST above that to alert him and the mailman of the terms of delivery.

that there are even more HTTP ‘verbs’---HEAD, TRACE, OPTIONS, CONNECT---that are not really talked about within the RESThype makes me suspicious. i want to argue that just as there are HTTP methods that are seldomly used (at least not by clients, only under the hood), so there are HTTP methods that got designed once, but are as superfluous as e.g. the POP and the FTP protocols (they got invented when HTTP was less prevailant than today---one protocol for a single purpose. they can’t do anything that HTTP can’t do).

DELETE and PUT are just as obsolete as the <font/> tag---they looked like a good idea at the time, and when they got replaced by something (much) better, no-one looked back.

add to this that now apparently, imagine my surprise, some guys come up with the great idea to take the session out of the cookie (why in the world doesn’t it belong there? it is the only thing a cookie is really really good at!) and put it back into the URL of all places. back to square one!

an SID clearly does not belong into the URL except in justified exceptional cases.

SIDs typically ruin the readability of URLs, which is bad. displaying a readable URL at each point of web interaction is a usability feature. i would go so far as to say that in many cases the page title and the URL should read the same.

shop owners must be aware that putting an SID into the URL at any point prior to checkout is damaging your business---simply because all the fancy stuff that a user strolls by when shopping is no more bookmarkable and no more communicable, which means less opportunity to come back to that offer or tell a friend about those shoes. shop URLs with session IDs are typically only valid for 15 minutes or an hour or so, since session IDs must expire for security reasons.

it may be argued that the presence of an SID in the URL during checkout does indeed help to communicate the one-off, ephemeral status of such pages---you are not expected to come back to step four of your purchase process the day after.

but, as you point out, that comes at a price: since in the interval before the session ID’s expiry (which in today’s system may be extended indefinitely by the attacker, who only needs to automatically refresh a browser window every so often), the URL-based SID enables session sharing between two or more people with only a single person having done an active login. the other participants got, as it were, ‘passively’ logged into the session.

so people, don’t buy into the hype that is CRUD-RESTful---and don’t go back to putting session IDs into the URL. CRUD-RESTful means you go back, literally, to the cruddy stone ages of the internet, and URLs with SIDs mean you go back to the iron ages. you don’t want that.

Mark said...

"Clearly, keeping the entire shopping cart in either the cookie or the URL is a no-go because both are extremely limited. It'd suck if you ran out of room in your cookie to add another thing to your cart"

You can get around the size limit by putting a hash of the cart state in the URL and caching a hash->state table on the server side.

Kelly Yancey said...

Hey JJ, just to further damn putting session Ids in URLs, I'll add that my favorite oft-overlooked session Id leakage is the off-site URL.

If you have so much as a single resource loaded from a third-party site, then that site can collection session Ids via the Referer HTTP header. Similarly, off-site links expose session Ids to the linked-to site via the Referer header also.

I've read the Ruby has a mechanism to try to prevent session Id leakage from off-site links by interjecting a trampoline redirect that drops the session Id from the URL before sending the user off to the target site. But I rarely see php or python-based sites using similar tricks.

One technique I have seen on the server-side, though, is to record the client's IP address along with their session Id and confirm both on each request. In theory, this limits the likelihood of a third-party from being able to kidnap a user's session. I haven't seen it personally, but if you told me that there were sites the required all requests to have the same User-Agent header for a single session too, I wouldn't be surprised.

Ludovic said...

I am sorry to fail the problem (I'm not saying it doesn't exists) but shouldn't you being verifying HTTP_REFERER, create an HTTPS session, check remote IP and User-agent anyway and al ?

I mean it feels like you're just relying on the Session Id to do a commercial transaction ?

I think I missed something. Could you explain your point further ?

LoveEncounterFlow said...

@Ludovic: there can very probably be several booksful to be said about how to and how not to do secure business transactions on the web, and SSL/HTTPS is one piece in that. however, your application should be secure from ground up, using best practices in every detail. it’s just not enough to make a site that comits grave blunders with sessions IDs and then serve it over HTTPS and hope that alone will suffice.

Mark said...

Just to reiterate, given that the cart state can be encoded in the URL, independent of the session ID, JJ's original problem can avoid most of these session-related issues.

Shannon -jj Behrens said...

kioopi:
> Also the IP could be changed by an ISP and the customer would lose their cart. That’s ugly.

Agreed.

> I wonder how Amazon does it. Is it just regular sessions on a massive backend?

Amazon uses a session cookie.

Shannon -jj Behrens said...

LoveEncounterFlow:
> i’ve said it before, and i’ll say it again, 10³ times if necessary: if RESTful means you go from GET and POST to GET, POST, PUT, DELETE---then it’s a bad idea.

Dude, you're preaching to the choir ;) As long as IE 6 continues to only support GET and POST, I won't be using more than those.

http://jjinux.blogspot.com/2008/09/web-rest-verbs.html

> add to this that now apparently, imagine my surprise, some guys come up with the great idea to take the session out of the cookie

Dude, I'm with you on that one. I still think sessions are really useful. Sure, figuring out how and where to keep state is a pain, but I maintain that session cookies are sometimes necessary.

> SIDs typically ruin the readability of URLs, which is bad.

Agreed. As I said, putting SIDs in URLs is actually a security hazard.

Shannon -jj Behrens said...

Mark:
> You can get around the size limit by putting a hash of the cart state in the URL and caching a hash->state table on the server side.

I think a session cookie is cleaner ;)

> Just to reiterate, given that the cart state can be encoded in the URL, independent of the session ID, JJ's original problem can avoid most of these session-related issues.

If you can fit all of your state into the URL in a signed, encrypted, opaque fashion, more power to you ;)

Shannon -jj Behrens said...

Kelly:
> If you have so much as a single resource loaded from a third-party site, then that site can collection session Ids via the Referer HTTP header.

Oh, gees, you're right. Good point.

> One technique I have seen on the server-side, though, is to record the client's IP address along with their session Id

I thought about that too. However, the drawback is that I wouldn't really
trust other members of my company not to have malware on their computer, and
since we'd be behind a NAT, we'd have the same IP. If IT manages our
computers, we'd also have the same User-Agent.

Shannon -jj Behrens said...

Ludovic:
> I am sorry to fail the problem (I'm not saying it doesn't exists) but shouldn't you being verifying HTTP_REFERER, create an HTTPS session, check remote IP and User-agent anyway and al ?

You can't insist on a valid referrer. What if the user adds something to his
cart, goes to Google, and then comes back? HTTPS doesn't have anything to do
with it. I can email a bad HTTPS link just as easily as a bad HTTP link.
Checking the remote IP doesn't help all that much because of corporate NATs,
as I mentioned above. I also covered the User-Agent case above as well.

Simply put, I just don't think putting SIDs in URLs is a good idea.

> I mean it feels like you're just relying on the Session Id to do a commercial transaction ?
> I think I missed something. Could you explain your point further ?

If a user comes to visit my site and doesn't log in, I need to have some
notion of who he is, especially if he starts adding things to his shopping
cart. That's what sessions are for. Putting SIDs in URLs makes them really
easy to hijack via well known session hijacking techniques:
http://en.wikipedia.org/wiki/Session_hijacking.

Shannon -jj Behrens said...

Wow, I'm amazed. Thanks for all the great comments!

Paul Bonser said...

One suggested solution is here:

http://www.peej.co.uk/articles/no-sessions.html

Mark said...

>If you can fit all of your state into the URL in a signed, encrypted, opaque fashion, more power to you ;)

I don't think that the cart state needs be secret or protected from external manipulation. Perhaps a longer explanation will make this more clear:

The problem is:

Before the user has logged in (i.e., before you have an authenticated
session to work with) the user will generate some state that will need
to be referenced once the authenticated session has been generated (e.g.,
the set of items that the user would like to purchase). Find a method
to store this state as part of the URL in such a way that it can not be
exploited by a session fixation attack.

I believe that a reasonable solution is:

1) Define a mapping from cart state to a short string. For a small,
fixed inventory, this can be done with a two-way function;
e.g. represent the inclusion of each inventory item as a bit, pack the
bits into a reasonable subset of ascii, and append a list of
quantities for the non-zero bits followed by a checksum. For a larger
inventory that is subject to change, this can be done with the hashing
trick that I mentioned previously.

2) During shopping, every action on the cart is a state transition that
will produce a new string. I.e., rather than maintaining a constant cart
ID in the URL with updates to the cart state on the server side, the actual
state of the cart is reflected in the URL, either explicitly in the case
of the two-way function or in a shorthand that the server knows how to
decode in the case of the hash.

3) After an authenticated session has been created for the user, the
unauthenticated cart state is referenced during checkout. In
other words, the authentication step looks like this:

Client: Hello, you don't know me, but I've been shopping in your store and
I've generated this cart state that I would like to buy (here it is as part
of my GET request).

Server: Hmmm, this looks like a valid cart state. Okay, let me create
a secure session for you so that I can ask you for your payment and
shipping information.

Client: Cool, now that we have a secure channel, I'm sending you a new GET
request. It includes my session ID and, as an independent field, the cart
state that I would like to buy.

Server: Great. Here's a checkout form including my interpretation of your
cart state. If we agree on what you're ordering, send me a POST request
and I'll go ahead and process your order.

---

The two important details of this transaction are:

1) There is no need for the cart state to be signed or opaque. If the user
intentionally manipulates the cart state independent of the store, we don't
care -- we just check that the cart state is a valid order that we can fill.
If the user unintentionally manipulates the cart state (e.g. if a
malicious third party sends the user a bogus URL), this will be caught during
the checkout step (which happens over a secure channel, so that the third
party can't play man-in-the-middle and misrepresent what's being checked out).

(This does not allow the user to shop anonymously, but I do not think that
anonymity is a common concern for shops that can be accessed via HTTP).

2) The cart state is never entangled with the user/session state. Therefore,
the concerns for securing the customer, shipping, and payment details are
independent of the concerns for communicating the cart state. The problem
of maintaining a secure session is simplified by limiting the session to only
the information that needs to be communicated securely.

Shannon -jj Behrens said...

In response to http://www.peej.co.uk/articles/no-sessions.html:
> Place the catelogue part of the site in a frameset

Yuck.

> Expect the client to also support cookies and stuff the cart data in there.

Unfortunately, cookies are fairly limited.

> Wait until the WHATWG Web Application 1.0 spec is finished and implemented

Uh, no.

> Now if we were to model that as our online shop we'd see that the client no longer has a basket as part of their state, the basket is part of the shop.

*sigh*

> Good, okay, I'd like to buy 1 of http://example.org/shop/product/X, please place it in my basket, my username is "JohnDoe" and my password is "secretPassword".

Remember that my requirement was for the shopping cart to exist *without* needing to log in. This is important for usability.

> Avoiding sessions is a bit of a purest stance, but it does lead to a more scalable and usable Web app.

There are *smart* ways of architecting sessions. Forcing the user to
even have an account before he can add items to his basket is *not*
more usable.

Shannon -jj Behrens said...

> I don't think that the cart state needs be secret or protected from external manipulation. Perhaps a longer explanation will make this more clear:

Mark, thanks for your comment. I do believe understand it completely. However, I am not particularly moved by it. Sorry.

First of all, you begged the question:

> Find a method
to store this state as part of the URL in such a way that it can not be
exploited by a session fixation attack.

Looking at this:

> For a small,
fixed inventory, this can be done with a two-way function;

Obviously, it won't work for Amazon.

> For a larger
inventory that is subject to change, this can be done with the hashing
trick that I mentioned previously.

The hashing trick suffers from all the same problems as sessions, plus a couple. You have to worry about hash collisions. You have to worry about expiring things less your shopping cart collections grow infinitely large, and you still have to have a place to store this stuff. Why not just use a session in that case?

> During shopping, every action on the cart is a state transition that
will produce a new string.

This is the one bit I do like. It reminds me of Haskell or Erlang.

> Okay, let me create
a secure session

If you're going to create a secure session at the end, why not do it at the beginning?

> as an independent field, the cart
state that I would like to buy.

What happens if I send someone a URL containing a cart ID and they never think to look to make sure the cart doesn't contain 15 copies of my book? ;)

> if a
malicious third party sends the user a bogus URL), this will be caught during
the checkout step

I think some non-zero percentage of users will not notice that their cart has been hacked with copies of my book ;)

Anyway, thanks again for the comment.