Tuesday, October 04, 2016

Intro to REST

Gregorio: Hi. I'm Joe Gregorio,
and I work in Developer Relations at Google.
This talk is on REST and, in the talk,
I presume you're familiar with the Atom Publishing Protocol.
If you're not, you can watch my other video
"An Introduction to the Atom Publishing Protocol,"
and then come back and watch this one.
So let's begin.
You may have heard the term REST,
and a lot of protocols these days
are advertising themselves as REST.
REST comes from Roy Fielding's thesis
and stands for Representational State Transfer.
It's an architectural style.
Now, an architectural style is an abstraction
as opposed to a concrete thing.
For example, this Shaker house
is different from the Shaker architectural style.
The architectural style of Shaker
defines the attributes or characteristics
you would see in a house built in that style.
In the same way, the REST architectural style
is a set of architectural constraints
you would see in a protocol built in that style.
HTTP is one such protocol.
And, for the remainder of this talk,
we're just going to talk about HTTP.
And I'll refer back
to the architectural constraints of REST
as we work through that example.
Now, it's simply not possible to cover every aspect HTTP,
so at the end of this presentation
there will be a further reading list,
if you'd like to learn more.
So why should you care about REST?
Well, it's the architecture of the Web as it works today.
And if you're going to be building applications
on the Web, shouldn't you be working
with the architecture instead of against it?
And, hopefully, as you see us go through this video,
there will be many opportunities
for increasing the performance
and scalability of your application,
and solve some traditionally tricky problems
by working with HTTP
and taking full advantage of its capabilities.
Let's get some of the basics down,
some nomenclature in the operation of HTTP.
At its simplest,
HTTP is a request response protocol.
You browser makes a request to the server,
the Web server gives you a response.
The beauty of the Web is that it appears very simple,
as if your browser is talking directly to the server.
So, let's look in detail
at a specific request and response.
Here is a GET request
to the URL http://example.org/news
and here's what the response looks like.
It's a 200 response
and what you're seeing here are the headers
and a little bit of the response body.
The request is to a resource identified by a URI,
in this case like I said, http://example.org/news.
Resources or addressability is very important.
The request is to a resource identified by a URI.
In this case, http://example.org/news.
The URI is broken down into two pieces.
The path goes into the request line,
and you can see the host shows up in the host header.
There is a method
and that's the action to perform on the resource.
There are actually several different methods
that can be used,
GET, PUT, DELETE, HEAD, and POST among others,
and each of those methods
has particular characteristics about them.
For example, GET is safe, idempotent, and cacheable.
Cacheable means the response can be cached
by an intermediary along the way,
idempotent means the request can be done multiple times,
and safe means there are no side effects
from performing that action.
So PUT is also idempotent,
but not safe, and not cacheable.
Same with DELETE, it is idempotent.
HEAD is safe and idempotent.
POST has none of those characteristics.
Also returned in that response
was the representation of that resource,
what lives at that URI.
The representation is the body
and, in this case, it was an HTML document.
HTML is a form of hypertext,
which means it has links to other resources.
Here is a traditional link that you would click on
to go to another page,
but there's more than one kind of link.
Here is a link to a CSS document
that the browser will call and include to style the page.
There's also other kinds of links.
Here's one to a JavaScript document
that will get pulled in.
This is a particularly important kind of hypertext
or document that's pulled in.
This is called Code on Demand,
the ability to load code into the browser
and execute it on the client.
The response headers show control data,
such as this header which controls how long
the response can be cached.
So now that we've looked
at simple HTTP request and response,
let's go back and look at some of the characteristics
that a RESTful protocol is supposed to have.
Application state and functionality
are directed into resources.
Those resources are uniquely addressable
using a universal syntax for use in hypermedia links.
All resources share a uniform interface
for transferring the state
between the client and the server
consisting of a constraint set of well-defined operations,
a constraint set of content types
optionally supporting Code on Demand,
and a protocol which is client-server,
stateless, layered, and cacheable.
Now that we've already talked about
many of these aspects with HTTP,
we can see that we already have resources
that are identified by URIs,
and those resources have a uniform interface
understanding a limited set of methods
such as GET, PUT, POST, HEAD, and DELETE,
and that the representations are self-identified,
a constraint set of content types
that might not only be hypertext,
but could also include Code on Demand
such as the example we saw with JavaScript.
And we've even seen that HTTP is a client-server protocol.
To discuss the remainder of the characteristics
of the protocol,
we need to look at the underlying structure
of the Web.
We originally started out with a simplified example
of how the Web appears to a client.
Let's switch to using the right names
for each of those pieces.
They're the user agent and the origin server.
The reality is that the connections
between these pieces could be a lot more complicated.
There can be many intermediaries between you and the server
you're connecting to.
By intermediaries, we mean HTTP intermediaries,
which doesn't include devices at lower levels
such as routers, modems, and access points.
Those intermediaries are
the layered part of the protocol,
and that layering allows intermediaries to be added
at various points in the request response path
without changing the interfaces between components
where they can do things to passing messages,
such as translation or improving performance with caching.
Intermediaries include proxies and gateways.
Proxies are chosen by the client,
while gateways are chosen by the origin server.
Despite the slide showing only one proxy and one gateway,
realize there may be several proxies and gateways
between your user agent and origin server,
or there may actually be none.
Finally, every actor in the chain,
from the user agent through the proxies
and the gateways to the origin server,
may have a cache associated with them.
If an intermediary does caching
and a response indicates that the response can be cached,
in this case for an hour,
then if a new request for that resource
comes within an hour,
then the cached response will be returned.
These caches finish out the major characteristics
of our REST protocol.
Now, we said this architecture had benefits.
What are some of those?
Let's first look at some of the performance benefits,
which include efficiency, scalability,
and user perceived performance.
For efficiency,
all of those caches help along the way.
Your request may not have to reach all the way back
to the origin server
or, in the case of a local user agent cache,
you may never even hit the network at all.
Control data allows the signaling of compression,
so a response can be GZIPPED before being sent
to the user agents that can handle them.
Scalability comes from many areas.
The use of gateways allows you to distribute traffic
among a large set of origin servers
based on method, URI, content type,
or any of the other headers coming in from the request.
Caching helps scalability also
as it reduces the actual number of requests
that make it all the way back to the origin server.
And statelessness allows a request to be routed
through different gateways and proxies,
thus avoiding introducing bottlenecks
and allowing more intermediaries to be added as needed.
Finally, User Perceived Performance is increased
by having a reduced set of known media types
that allows browsers to handle known types much faster.
For example, partial rendering of HTML documents
as they download.
Also, Code on Demand allows computations
to be moved closer to the client
or closer to the server,
depending on where the work can be done fastest.
For example, having JavaScript to do form validation
before a request is even made to the origin server
is obviously faster
than round-tripping the form values to the server
and having the server return any validation errors.
Similarly, caching helps here as it requests may not need
to go completely back to the origin server.
Also, since GET is idempotent and safe,
a user agent could pre-fetch results before they're needed,
thus increasing user perceived performance.
Lots of other benefits we won't cover,
but these are outlined in Roy's thesis.
But all these benefits aren't free.
You actually have to structure your application
or service to take advantage of them.
If you do, then you will get the benefits.
And if you don't, you won't get them.
To see how structuring helps, let's look at two protocols:
XML-RPC and the Atom Publishing Protocol.
So this is what an XML-RPC request looks like,
and here's an example response.
All of the requests in XML-RPC are posts.
So what do the intermediaries see of this request response?
Is it safe? No.
Is it idempotent? No.
Or is it cacheable? No.
If they are, the intermediaries would never know that.
All the requests go to the same URI,
which means that if you're going to distribute many such calls
among a group of origin servers,
you would have to look inside the body
for the method name.
This gives the least amount of information to the Web,
and thus it doesn't get any help from intermediaries
and doesn't scale with off the shelf parts.
So let's take a look at the Atom Publishing Protocol.
So for authoring to begin in the Atom Publishing Protocol,
a client needs to discover the capabilities and locations
of the available collections.
Service documents are designed
to support this discovery service.
To retrieve a service document, we send a GET to its URI.
GET is safe, idempotent, cacheable, and zipable.
The response type is self-identifying.
As you can see, there's a content type header
of application Atom Service plus XML
that self-identifies what the content is specifically,
and the response itself is hypertexted.
It contains URIs for each of the collections.
That's what's highlighted, in this slide here,
is the relative URI for the collection.
Once we have a collection URI,
we can post an entry to create a new member,
and then GET, PUT, or DELETE the members at their own URIs.
So here's an example of a GET to a collection document.
Again, this is safe, idempotent, cacheable, and zipable.
The response is also self-identifying here
as you have another content type,
application/atom+xml.
And again, the response is hypertext.
Lastly, the edit URI identifies
where the entry can actually be modified.
That URI, you can do a GET to, to retrieve it,
you can send a PUT to update the resource,
or you can send a DELETE to remove it
from the collection.
So as you can see, the Atom Publishing Protocol
is designed with RESTful characteristics in mind
and gets many advantages
from intermediaries and the network itself
as those messages transfer back and forth.
So, let's look at some of the other idioms
that you can use in building your RESTful protocol
to get some of the advantages.
For example, long-lived images.
If you have large images
that need to be transferred back and forth
as part of your Web page, what you should do is
set the cache for those images to be very long.
If you need to update those images,
upload a new image to a new URI
and change the HTML to point to that new URI.
Here's an example where I have big-image.png.
And, if we retrieve that image,
you'll see that the cache control header
has been set to a very long time.
In this case, 30 days.
If we made a mistake, or we'd like to update that image,
what I need to do is upload a new image, big-image-2,
set the cache control for that to be very long,
and then update the HTML.
The idea here is that you keep the HTML
with the short cache lifetime,
and thus you can update that easily.
So there you go,
a high level view of REST and how it relates to HTTP.
Here's the list of further reading
that I had promised you.
"RFC 2616" actually outlines what HTTP is.
"RFC 3986" outlines the URI standard.
You can read Roy Fielding's thesis,
"Architectural Styles and the Design
of Network-based Software Architectures."
And there's also this "Caching Tutorial"
by Mark Nottingham which covers in detail
many of the things we just talked about.
Thanks and have fun.

No comments:

Post a Comment