+++
template = "article.html"
title = "Using an API - Trakt.tv example"
date = 2014-10-09T00:06:00+02:00
description = "Learn how to interact with web APIs using Trakt.tv as a practical example, including authentication and data retrieval."
[taxonomies]
tags = ["api", "ruby"]
+++
A lot of websites are generating data which could be really useful outside a
web browser. Having the weather shown on your smartphone's lock-screen, the
delay until your next bus… How can we use this from an application?
This article will explain what's behind these hidden data flows, and how to
use them. For this purpose, I'll use [Trakt.tv](http://trakt.tv) as an example.
If you don't know it: Trakt allows you to manage your movie/TV series library,
keep track of watch progress, your ratings, comments… and see those of other
people.
Some code will show how to send such requests. It will be written in
[Ruby](http://ruby-lang.org/).
## Planet of the APIs
The concept behind these data flows is called an
[API](http://en.wikipedia.org/wiki/API) - Application Programming Interface.
While API is a pretty generic term, in this specific case, it specifies how a
website will expose its data (and possibly receive some, too) to (and possibily
from) a client. A client can be any kind of application sending requests to the
website (the server). Not to be confused with the user, which uses a client,
which in turn sends requests to a server.
{{ img(src="/images/articles/using-an-api-trakt-dot-tv-example/user-client-server.png", caption="User, client, and server relationship") }}
For websites, two data formats are mostly used:
[XML](http://en.wikipedia.org/wiki/XML) - eXtensible Markup Language - and
[JSON](http://en.wikipedia.org/wiki/JSON) - JavaScript Object Notation. Both
serve the same purpose: presenting data in a computer-readable way. For example,
this is how you could represent a movie in JSON:
*Sample representation of a movie*
```json
{
"title": "Sharknado",
"year": 2013,
"runtime": 86,
"tagline": "Enough said!",
"overview": "A freak hurricane hits Los Angeles, causing man-eating sharks to be scooped up in tornadoes and flooding the city with shark-infested seawater. Surfer and bar-owner Fin sets out with his friends Baz and Nova to rescue his estranged wife April and teenage daughter Claudia",
"genres": [
"Science Fiction",
"Horror"
]
}
```
Trakt's API uses JSON. As a matter of fact, this JSON bit comes from the API.
You can find the same details about the movie directly on
[Trakt](http://trakt.tv/movie/sharknado-2013). It's just not useable from an
application directly.
Trakt's API documentation is available [here](http://trakt.tv/api-docs).
## REST in pieces
If an API was made for only one goal - like indicating the weather in a given
city -, it would be pretty simple: just access an URL like
[http://myapi.net/weather?city=Toulouse](http://myapi.net/weather?city=Toulouse)
, return the corresponding data, done. However, for a fully-featured API, dealing
with multiple elements like movies, series, series episodes, users… you need
something a little more complex than that. And that's where concepts like
[REST](http://en.wikipedia.org/wiki/Representational_state_transfer) -
Representational state transfer - come to help.
REST is a list of five (and an optional sixth) constraints for the API
architecture:
- **Client/server oriented**: data is stored on a server, and displayed on a client
- **Stateless**: every request from a client must contain all the information
needed to be handled by the server. If any kind of state management is needed,
the client is in charge
- **Cacheable**: responses from the server must specifiy how long they are valid
(we'll get back to that a bit later)
- **Uniform interface**: this one is a list of four sub-constraints to guide
the API organization:
- **Identification of resources**: the way an item is identified in a
response should be consistent with the request, not with the database behind
the API
- **This identification must be sufficient**: a client must be able to modify
or delete a resource from the identification given by the server
- **Self-descriptive requests**: a request must contain everything needed to
be processed by the server. Similar to the stateless constraint
- **Simple client transitions**: if a client has to send a separate request to
access more related information, the way to get it must be described in the
initial response
- **Layered system**: a request must be able to be as specific as possible, so
the server doesn't have to send its full database content when replying. A
client can ask for a list of movies, but can also ask for details about a
specific one if it's just interested in this one
- **Code on demand (optional)**: the server can send a bit of script, which
will be run by the client, either to limit the server's load, or to
occasionally change the client behaviour.
A REST architecture has multiple advantages. The simple fact of being stateless
is probably the most important to the end user: the client doesn't have to
maintain a connection to the server between requests, which allows a huge
reduction in power consumption on mobile devices. It also simplifies the
load-balancing on the server side (any server can process any request without
the need of an existing context).
From a developer point of view, a layered architecture tends to make the system
easier to maintain, and easier to use.
## Doctor Who?
There are many aspects regarding the authentication when talking about an API.
A common need is to identify the user. Who's trying to mark this movie as
watched? Another one is to identify the application using the API. A user may
want to see a list of applications accessing his account, remove access to a
specific one…
Identifying the application is pretty easy. For APIs where this identification
is in place, the developer must register the application. The website then
provides a unique identifier for this application. This is called an **API
key**. This key must be placed within every request from this application. The
API website can reject every request without a valid key, revoke a specific key…
## Finding Nemo
We now know how a website exposes its data, how to parse them, and how they are
organized. Here's a concrete use case: we want to retrieve the list of the
directors of the Finding Nemo movie.
We first need to find the movie. We'll use the
[`search/movies`](http://trakt.tv/api-docs/search-movies) API method. It needs
three parameters, the last one being optional:
- **format**: how the response should be formated. Only JSON is supported for
this method
- **apikey**: we need to identify ourselves as a Trakt.tv API user
- **query**: what are we looking for?
- **limit**: number of results at max. Defaults to 30
And here's how the request should look like:
`http://api.trakt.tv/search/movies.format/apikey?query=query&limit=limit`
Let's call it in Ruby:
*search/movies*
```ruby
require 'cgi'
require 'json'
require 'net/http'
require 'uri'
format = 'json'
api_key = '1234567890abcdefghijklmnopqrstuv'
movie_to_search = 'Finding Nemo'
# CGI::escape is needed to convert special characters from the movie name
# In this case, we need to escape the space
uri = URI.parse "http://api.trakt.tv/search/movies.#{format}/#{api_key}?query=#{CGI::escape movie_to_search}"
# Send the request to the server
response = Net::HTTP.get_response uri
# Parse the response as a JSON object
json_response = JSON.parse response.body
# Print it nicely
puts JSON.pretty_generate json_response
```
And here's the result, truncated to the first two movies:
*search/movies result for Finding Nemo*
```json
[
{
"title": "Finding Nemo",
"year": 2003,
"released": 1054278000,
"url": "http://trakt.tv/movie/finding-nemo-2003",
"trailer": "http://youtube.com/watch?v=SPHfeNgogVs",
"runtime": 100,
"tagline": "There are 3.7 trillion fish in the ocean, they're looking for one.",
"overview": "A tale which follows the comedic and eventful journeys of two fish, the fretful Marlin and his young son Nemo, who are separated from each other in the Great Barrier Reef when Nemo is unexpectedly taken from his home and thrust into a fish tank in a dentist's office overlooking Sydney Harbor. Buoyed by the companionship of a friendly but forgetful fish named Dory, the overly cautious Marlin embarks on a dangerous trek and finds himself the unlikely hero of an epic journey to rescue his son.",
"certification": "G",
"imdb_id": "tt0266543",
"tmdb_id": 12,
"images": {
"poster": "http://slurm.trakt.us/images/posters_movies/647.4.jpg",
"fanart": "http://slurm.trakt.us/images/fanart_movies/647.4.jpg"
},
"genres": [
"Animation",
"Comedy",
"Family"
],
"ratings": {
"percentage": 85,
"votes": 7917,
"loved": 7668,
"hated": 249
}
},
{
"title": "Finding Dory",
"year": 2016,
"released": 1466146800,
"url": "http://trakt.tv/movie/finding-dory-2016",
"trailer": "http://youtube.com/watch?v=q2a3tS7zNcU",
"runtime": 0,
"tagline": "",
"overview": "Sequel to the 2003 Pixar film 'Finding Nemo'",
"certification": "G",
"imdb_id": "tt2277860",
"tmdb_id": 127380,
"images": {
"poster": "http://slurm.trakt.us/images/posters_movies/209152.1.jpg",
"fanart": "http://slurm.trakt.us/images/fanart_movies/209152.1.jpg"
},
"genres": [
"Adventure",
"Animation",
"Comedy",
"Family"
],
"ratings": {
"percentage": 100,
"votes": 4,
"loved": 4,
"hated": 0
}
},
// ...
]
```
Look at the first one: it is the movie we were looking for! Now, we want to
display its details. To achieve this, we'll need to use another method:
[`movie/summary`](http://trakt.tv/api-docs/movie-summary). Here are its
parameters:
- **format** and **apikey**: same as above
- **title**: Could be the last part of the `url` attribute we got earlier (for
Finding Nemo, it would be `finding-nemo-2003`), the
[IMDB](http://www.imdb.com/) ID (`imdb_id` attribute), or the
[TMDB](https://www.themoviedb.org) ID (`tmdb_id` attribute). We'll use the
IMDB ID.
Here's how the request should look like:
`http://api.trakt.tv/movie/summary.format/apikey/title`
*movie/summary*
```ruby
require 'cgi'
require 'json'
require 'net/http'
require 'uri'
format = 'json'
api_key = '1234567890abcdefghijklmnopqrstuv'
movie_to_search = 'Finding Nemo'
# CGI::escape is needed to convert special characters from the movie name
# In this case, we need to escape the space
uri = URI.parse "http://api.trakt.tv/search/movies.#{format}/#{api_key}?query=#{CGI::escape movie_to_search}"
# Send the request to the server
response = Net::HTTP.get_response uri
# Parse the response as a JSON object
json_response = JSON.parse response.body
# Extract the IMDB ID of the first result
imdb_id = json_response.first['imdb_id']
# Request the summary
uri = URI.parse "http://api.trakt.tv/movie/summary.#{format}/#{api_key}/#{imdb_id}"
response = Net::HTTP.get_response uri
json_response = JSON.parse response.body
# Print it nicely
puts JSON.pretty_generate json_response
```
And the truncated output:
*movie/summary for Finding Nemo*
```json
{
"title": "Finding Nemo",
"year": 2003,
"released": 1054278000,
"url": "http://trakt.tv/movie/finding-nemo-2003",
"trailer": "http://youtube.com/watch?v=SPHfeNgogVs",
"runtime": 100,
"tagline": "There are 3.7 trillion fish in the ocean, they're looking for one.",
"overview": "A tale which follows the comedic and eventful journeys of two fish, the fretful Marlin and his young son Nemo, who are separated from each other in the Great Barrier Reef when Nemo is unexpectedly taken from his home and thrust into a fish tank in a dentist's office overlooking Sydney Harbor. Buoyed by the companionship of a friendly but forgetful fish named Dory, the overly cautious Marlin embarks on a dangerous trek and finds himself the unlikely hero of an epic journey to rescue his son.",
"certification": "G",
"imdb_id": "tt0266543",
"tmdb_id": 12,
"rt_id": 9377,
"last_updated": 1405432489,
"poster": "http://slurm.trakt.us/images/posters_movies/647.4.jpg",
"images": {
"poster": "http://slurm.trakt.us/images/posters_movies/647.4.jpg",
"fanart": "http://slurm.trakt.us/images/fanart_movies/647.4.jpg"
},
"top_watchers": [
{
"plays": 91,
"username": "Damon_old",
"protected": false,
"full_name": "",
"gender": "",
"age": "",
"location": "",
"about": "",
"joined": 0,
"avatar": "http://slurm.trakt.us/images/avatar-large.jpg",
"url": "http://trakt.tv/user/Damon_old"
},
// ...
],
"ratings": {
"percentage": 85,
"votes": 7917,
"loved": 7668,
"hated": 249
},
"stats": {
"watchers": 2135,
"plays": 5541,
"scrobbles": 5357,
"scrobbles_unique": 1975,
"checkins": 184,
"checkins_unique": 163,
"collection": 12633
},
"people": {
"directors": [
{
"name": "Andrew Stanton",
"images": {
"headshot": "http://slurm.trakt.us/images/avatar-large.jpg"
}
},
{
"name": "Lee Unkrich",
"images": {
"headshot": "http://slurm.trakt.us/images/avatar-large.jpg"
}
}
],
"writers": [
{
"name": "Andrew Stanton",
"job": "Screenplay",
"images": {
"headshot": "http://slurm.trakt.us/images/avatar-large.jpg"
}
},
// ...
],
"producers": [
{
"name": "Graham Walters",
"executive": false,
"images": {
"headshot": "http://slurm.trakt.us/images/avatar-large.jpg"
}
},
{
"name": "John Lasseter",
"executive": true,
"images": {
"headshot": "http://slurm.trakt.us/images/avatar-large.jpg"
}
}
],
"actors": [
{
"name": "Albert Brooks",
"character": "Marlin",
"images": {
"headshot": "http://slurm.trakt.us/images/avatar-large.jpg"
}
},
{
"name": "Ellen DeGeneres",
"character": "Dory",
"images": {
"headshot": "http://slurm.trakt.us/images/avatar-large.jpg"
}
},
// ...
]
},
"genres": [
"Animation",
"Family",
"Comedy"
]
}
```
We have our directors in the `people` object, which contains a `directors`
array. Let's extract it:
*Extract Finding Nemo's directors*
```ruby
require 'cgi'
require 'json'
require 'net/http'
require 'uri'
format = 'json'
api_key = '1234567890abcdefghijklmnopqrstuv'
movie_to_search = 'Finding Nemo'
# CGI::escape is needed to convert special characters from the movie name
# In this case, we need to escape the space
uri = URI.parse "http://api.trakt.tv/search/movies.#{format}/#{api_key}?query=#{CGI::escape movie_to_search}"
# Send the request to the server
response = Net::HTTP.get_response uri
# Parse the response as a JSON object
json_response = JSON.parse response.body
# Extract the IMDB ID of the first result
imdb_id = json_response.first['imdb_id']
# Request the summary
uri = URI.parse "http://api.trakt.tv/movie/summary.#{format}/#{api_key}/#{imdb_id}"
response = Net::HTTP.get_response uri
json_response = JSON.parse response.body
# Extract directors
json_response['people']['directors'].each do |director|
puts director['name']
end
```
And here's the output:
*Finding Nemo's directors*
```
Andrew Stanton
Lee Unkrich
```
We can check the result directly
[on Trakt](http://trakt.tv/movie/finding-nemo-2003): looks like we're good!
## Cache me if you can
Now, we know how to retrieve data about movies. We can build an awesome
application, using Trakt.tv lists to keep a list of movies we want to watch.
However, because we're spending so much time to work on this application, we
don't have any time left to watch thoses movie. Every time we fire up the
application, we fetch, for every movie, all the information back from the API.
Even if we're pretty sure the data hasn't changed (I mean, have you ever seen a
movie director changing after it has been released?), we use bandwith, data plan
on a smartphone, battery…
This is **bad**.
We need to find a way to save this kind of data offline to use less bandwith,
but we also need to refresh these data if they have been changed on the server.
This is called caching.
> A web cache stores copies of documents passing through it; subsequent requests
> may be satisfied from the cache if certain conditions are met.
> -- [Wikipedia][1]
[1]:http://en.wikipedia.org/wiki/Web_cache
There are multiple ways to cache accesses to an API.
The simplest is to rely on HTTP cache if the API is accessed over
HTTP: the HTTP standard includes cache-related headers, and the API should tell
us how long we can keep the data cached. Until this delay has expired, we can
use our offline copy of the data. When it's out of date, we simply re-fetch it.
Another mechanism, probably more convenient for an API like Trakt, is the
[ETag](http://en.wikipedia.org/wiki/HTTP_ETag). An ETag identifies a specific
version of an answer. When we make a request (for example, ask for a movie
details), the server can add an `ETag` header. If it's present, we should store
it alongside an offline copy of the movie details. Next time we need to use
these details, we should ask the server: "Hey, I have a copy of these details
matching this ETag! Have you anything newer for me?" If there is a new version
available, the server will send it over, with a new ETag. We have lost two dozen
bytes of bandwith. Damn.
However, if the details haven't changed, the server will simply answer with an
empty `Not Modified` response. And that will be most of the time, for this kind
of request. Hooray for the planet. And our batteries.
There are tons of ways of caching content. Nearly every API will use a different
mechanism, suitable to its data lifespan.
On a fast ADSL connection, a simple request on the Trakt.tv API takes a couple
hundred milliseconds. If you run multiple queries in parallel, it will be
slower. Now, imagine on a 3G network. Cache does matter.
## Requiem for an API
As you can see, once an API has been written, using it is pretty easy. With
standards like JSON and XML, REST… they basically all work in the same way.
The crucial part to develop an application is, however, the actual availability
of an API. Not all sites provide one. On the other hand, some sites provide
excellent ones, some of them being just an API client themselves (that's a
principle called
[dogfooding](http://en.wikipedia.org/wiki/Eating_your_own_dog_food)).
Code samples presented here are in Ruby, but you can of course use any
language you want to access to an API (well, good luck to use an API in
[brainfuck](http://blog.twal.org/programming-in-brainfuck-part-0.html)). Some
API providers maintain library in several languages to ease the use of their
API, providing high-level objects (example
[here](https://github.com/octokit/octokit.rb), a library provided by GitHub to
access its API in Ruby).