Monday, October 10, 2016

Tom Stuart - Refactoring Ruby with Monads

r
half an hour of your finite lives
I'm a good time Stewart I'm going to tell you about monads and how we can
sort of uncover them by refactoring are Ruby code
so the ideas in this talk might be unfamiliar to some of you so before we
get stuck into the details
I want to just warm our brains up with something simple before I get into the
difficult bit
people always use analogies to explain my own ads and it doesn't help
so there will be no burritos or space suits or elephants in this talk but i do
want to talk about some related ideas not analogies but just related ideas
that will put your brain in a receptive state for the stuff i want to explain to
you
I'll start with a rhetorical question what is a stack
well it's a kind of value with certain operations and those operations are
called push pop top and empty
I'm talking about an immutable stack here so question part don't you take the
stack they just returned a new one
you have to use top if you want to look at the top element
we also need a class method called empty which creates an empty stack to get
started for a value to qualify as a stack its operations have to follow
certain rules for snacks the rules just say that the operations do what you
intuitively expect when you push a value on to a stack that becomes the top value
pushing been popping is a no off an empty stack is empty and a stack with
something pushed onto it
it is an empty we can implement the stack operations however we like here's
a class called a race stack that implements the operations in a
particular way
it stores the static content is an array of values push adds value onto the front
of the array and pop removes the arrays first item
this class is linked list stack implements the same operations in a
different way
it stores the top element and a pointer to the rest of the stack so top and pop
just attributes and push stores its value in a new stack instance pointing
out the old one
we can use the stack operations without knowing how they're implemented if we
make an empty array stack and push two values onto it and then pop one of them
and ask for the top value we get the result that we expect if we do the same
thing with a linked list stack it works in exactly the same way we can define
more operations in terms of the old ones
so for example here's a method called size that recursively calculates the
size of a stack by counting how many times it has to be popped until it's
empty
so as long as the sax got working implementations of empty and pop we can
implement this size method on top of them
it doesn't matter what stack implementation we use
sighs always works the same way an empty array stack has size 0 if we push two
values on its sizes to an empty linked lists tank has size 0 and if we push two
values on its size is too
so what do we mean by stac it's really a specification and implementation of a
stack provides certain operations that follow soon
rules there are two benefits of that
firstly those operations provider sort of common interface across many possible
implementations and secondly they allow us to build shared functionality like
that sighs method that works with any implementation
here's another question was a collection or at least in Ruby what's the
collection
well it's a kind of value with certain operations
actually just one operation in Ruby called each and that operation follow
certain rules
actually just one role in Ruby which is that each cause a block with a value
zero or more times in immediate sequence
that's it now we can implement that operation however we like here's a
hard-coded collection
who's each method literally cause a block with one and then with to all the
way up to five
here's a generated collection who's each method calculates those monies
dynamically in a loop and cause a block with each one we can use that each
operation without knowing its implementation from the outside
hardcoded collection and generated collection
both behave like a collection of the numbers from one to five
we can define more collection operations on top of the one operation we already
have
for example here's a method called select that takes a block and then calls
each and accumulate all the values that make the block return true
so as long as a collection has a working implementation of each we can implement
select on top of it it works the same for both implementations of course in
Ruby we have a module called innumerable that already has
select and count and map and inject and tons of other helpful stuff that all
sits on top of this one
each method so what
do we mean by collection again it's really just a specification and
implementation of a collection provides an operation that follows a rule and
again we get to benefits that each method gives us a common interface
across many possible implementations of collections and allows us to build
shared functionality like select that works with any implementation
so what name do we give these things are they design patterns or interfaces or
API is or duck types all of those words are appropriate and they also have
overlapped to an extent the concept of a stack or a collection kind of sits in
the middle of all of them but in my opinion the most specific and therefore
the most accurate term is abstract data type that literally means a kind of
value which has certain operations that follow certain rules snacks and
collections are abstract concepts then but their abstract in a good way
the abstractions give us power and we expect program is to understand them
nobody talks about stacks in hushed tones they're simple and they're useful
ok
so that's enough priming of your brain let's do some refactoring first I'd like
to look at some code that has to deal with Mills imagine we have a project
management app with different kinds of models each project has a person who
created it
each person has an address each address has a country each country has a capital
city and each city has weather information which for the sake of
simplicity let's just assume is a string
so let's say that in our user interface for this application we want to display
the weather next to each project for some reason that involves traversing all
of those associations
so here's a method that does that maybe this is the kind of figured right a
rails view help
there are lots of reasons not to write code like this but there are also good
reasons to do it and anyway people will always write code like this no matter
what we say if we make a city which has sunny weather and a country which has
that city as its capital and address in that country and a person with that
address on a project created by that person we can pass that project into
whether for and it works fine but if we make a bad project for example a project
with an address that has no country then whether for blows up
so tony hoare invented mills in nineteen sixty-five and he does now call it his
billion-dollar mistake which he says has probably caused a billion dollars of
pain and damage and this is exactly the sort of thing he's talking about
so they may be a mistake but ruby has mills
so we're stuck with them to make our weather for method tolerate mills
we're going to have to explicitly check for them so we need to introduce local
variables to hold every intermediate result and then check each intermediate
result
before we call a method on it while we're at it we might as well include the
possibility that the project itself is no
now this is turning into a bit of a pyramid of doom you can see the code
kind of drifting over to the right
but luckily this code looks the same if we just plan it so this code works but
it's pretty clumsy and it's hard to remember to do something like this every
time we might possibly have to build deal with nil
fortunately rails has got a solution to this problem
so rails actually active support monkey patches object and nil class with an
ethical try which delegates to publix end if the objects not nil and just
returns nail if it is so when every object in the system has a try method
instead of doing all of these no checks ourselves we can let try do it for us
and now we're back to this training method calls together so we can take the
local variables out again like that
so just make that a bit bigger so this is as good as it gets right now it's
better than the version with unless nil all over the place anyway
but can we do any better
well monkey patching definitely has its place but monkey patching every single
object in the system isn't great is it
this is kind of a code smell let's not do it so
ok tries gone again now we can all relax
so when we want to add a method to an object so good
object-oriented programming solution is to use decoration and decoration is
where you noninvasively add functionality to one object by wrapping
it up inside another object so let's make a decorator class called optional
who's instances have a single attribute called value instances of this class
just wrap up another value
I can make a new one containing a value like the string hello
and then i can take hello out again later if the value i put in happens to
be nil
I get nail out later now instead of putting the tri method on object
let's put it on optional if the value attribute is nil just return nil
otherwise it will send the appropriate message to the underlying object
so now we can call try on the decorator and it will call the method on the
underlying object as long as it's not nil
if the value inside the optional is no try we'll just return now
so instead of calling try on the actual project object and then on the actual
person object and so on
we can write the method like this we decorate project with an optional object
and we call try on that then we decorate the result which might be nil and called
try on that and then we decorate the result of that call and call try on it
and so on
at the end we pull out the value and return it
so that's unwieldy but at least we're not monkey patching every object in the
system anymore
there's another smell here which is that the tri method does too much
we actually just wanted to refactor away the nil check but try also sends the
value a message but what if we wanted to use the value in some other way when
it's not now
try is kind of over specialized it's got too much responsibility
so instead of hard-coding the else clause here
let's allow the caller to supply a block that controls what happens next
now we can pass a block in to try and do whatever we want with the underlying
value we can send a message or we can use it as an argument in a method call
or we can print it out or whatever and this ability to pass a block in to try
is actually a little-used feature of active support try method as well so now
instead of calling try with a message name and having to remember that it's
going to send that message to the underlying object
we call it with a block and inside the block we send the message ourselves and
decorate the result in an optional and we could do anything else that we wanted
with the value inside that block like print out a lot message or whatever
that works fine when their own nails but unfortunately we've broken it when mills
are involved because we're returning nil
when the block doesn't run that's easy to fix
instead of returning a wrong lil here will decorate it with an optional first
and now it works in both cases but there's a new smell which is that I
don't think try is a great name anymore because we've changed it to do something
more general than or at least something different from the main use case of its
namesake in active support
so let's rename it to and then because it really just says start with this
decorated value and then do some arbitrary thing with it as long as it's
not now
so here's the new version of our method which calls and then instead of try and
because we're just changing and then calls we can get rid of the local
variables so this is verbose but it's nice we decorate the possibly nil
project in an optional object and then we safely traverse all of the
associations and then we pull the possibly nil value our again at the end
so
ok how's our refactoring going well we might not be monkey patching anything
and it's conceptually clean but there's a huge final smell which is that nobody
wants to write code like this
in theory it might be better than active support try method but in practice it's
worse but we can add some syntactic sugar to fix that
here's a definition of method missing for optional it uses and then to
delegate any message to the underlying value whenever it's not mill so now we
can replace all of this and then optional . new with just normal message
sends and let method missing take care of the details i'll just reformat that
so there we go this is actually really good
you can see very clearly that we wrap up the possibly nil project into an
optional and then we safely to our chain of methods and then we extract the
possibly know whether out of an optional at the end
so to recap this is the whole thing
an object which stores the value that might be nil and ethical and then which
encapsulates the nil check logic we added some sugar on top by writing
method missing
and if you're doing this in real code you should also remember to write
respond to
I'd like to very briefly point out that we only need to do the decorating a nun
decorating for compatibility with the rest of the system
if the rest of the system passed in an optional object and expected us to
return one we wouldn't even need to do that and then we won't have to remember
to check for nill at all
we could just write the method the way we did in the first place and it would
just work
imagine that all right
that refactoring was very detailed we're going to do to others but we'll have to
skip the detail to save time
let's refactor some code that has to handle multiple results
imagine we have a content management application with different kinds of
models
there are several blogs every blog has many categories each category has many
posts
each post has many comments and for the sake of simplicity let's assume the
comments are just strings so let's say we want to fetch all the words from all
the comments within certain blogs for some reason that involves traversing all
of these associations again
here's a method that does that each level we map over a collection and
traverse the association for each object inside it when we reach each comment
we split it on white space to get its words we have to use flatten out here
because we want a flat and array of words instead of a nested or
if we make a couple of blogs which have a couple of categories which contains
some posts which have some comments which contains some words you can see
here that my example accurately represents the usual level of discourse
with blog comments then the words in method can pull all of the words out
we're not worried about the duplicating them or anything we just want all of the
words but this method has got a bit of a pyramid of doom going on
and plus it's hard to distinguish between the code doing actual work and
the boiler plate of dealing with multiple values we can clean it up by
introducing this class many which decorates the collection of values like
optional it has an and then method which takes a block but this time it calls the
block for every value in the collection and flattens the results together so we
can replace all the calls to a flat map with instances of many and calls to and
then and now we've got instance is being returned we can flatten the pyramid and
reformat the code a little bit to get this
so again this is pretty clear but we can add some syntactic sugar by defining
method missing
this is exactly the same as optionals method missing except it's calling many
. new is that of optional done you that lets us replace all over and then many .
new calls with just simple message sentence
this is very nice we put the blog posts into a many object reverse all of the
associations and then take the values out at the end and again if the rest of
the system could deal with instances of many we could just expect one and return
12 recap here's the class we just made
from third quick refactoring we're going to tackle writing asynchronous code is
anyone there is anyone here know who the most influential rubios is now well
let's find out once and for all
we'll find out by using the github API to find the person who's made the most
commits on the most popular Ruby project when you make an HTTP request so they
get hub API route you get back some Jason and it looks more or less like
this
among other things this good so you are you are I template for finding out
information about any organization
so now we know what you are able to use to get information about the Ruby
organization when we make her get request to that URL we get some Jason
that contains the URL we can use to get a list of all of the Ruby organizations
repositories
so we fetch the list of repositories which includes information about how
many watches each one has from that we can see which repository has the most
watches which turns out to be the main Ruby repository as you might expect and
the URL for that repositories representation in the API and when we
fetch that repositories information we got another URL that tells us where to
get its list of contributors
so then we load the list of contributors to the main Ruby repository which
includes information about how many committees contributor has made so we
picked the one with the most commits a user called know boo
and finally fetch information about Nobu from the URL in the contributor list
so it turns out that nobody ocean the Carter has the most commits on the most
popular Ruby project
thank you wo shi ok
that was a bit exhausting so let's write some code to do it for us assume we
already have this get Jason method it asynchronously makes an HTTP GET request
positive the Jason response into a ruby hash or array and then it calls a call
back with the data if you like you can imagine a single-threaded non-blocking
event machine equivalent of this
so to do what we just did we have to get the uri templates from the github API
room then fill in the template with the name of the Ruby organization then get
the organization data then find the URL for the list of its repositories then
get the list of its repositories then find the URL of the repository of the
most watches
then get the information on that repository then find the URL for the
list of its contributors
then get the list of its contributors then find the URL of the contributor
with the most commits then get the information on that user and then print
out their real name and username
so this code works but it's drifting to the right again
it's hard to understand and maintain deeply nested code like this but we
can't flatten it because of the nested callbacks
so very briefly the solution is to make an eventually class that decorates our
block the idea is that the block computes the value that might take a
while to produce and then the run method runs the block with a callback for it to
call when the value becomes available
we don't have time to go into the details but here's an and then method
that we can use to add extra asynchronous processing to the value
produced by and eventually
it's more complicated than the and methods we've seen earlier but it
achieves the same thing
the main detail here is just making sure that all the callbacks get wired up
correctly so we can rewrite this code by putting each asynchronous get Jason call
inside a block that we decorate with an eventually object so we connect all of
the eventual ease with and then
and then we run them this isn't super readable either but now we can start
pulling out each logical part into its own method the code that gets all the
URL templates from github can go into a method called get hub API URLs this
returns and eventually which decorates a block which will eventually call its
call back with a result of fetching and pausing the Jason
so we can replace the line at the top with get-get of API URLs . and then the
next bit of code that fetches the data for the Ruby organization can go into a
method called get org
this returns and eventually object as well so we can replace the next bit of
code with a call to get org and then the code that gets all of the Ruby
organizations repositories can go into a get repos method and then we can call
their and so on for the rest of it
let me just clean up so now that we're just creating eventually instances of
each step we don't need to call and then on each one
immediately we can let each eventually object be returned from its enclosing
blog before we call and then on it
so basically we can flatten this to get this let me just reformat that
so this is much nicer than what we had before each part is nicely encapsulated
in its own method and the parts are connected together in a clean way and
this might be a familiar pattern to some of you it's similar to promises or
futures or deferred that you might have seen in JavaScript or event machine
so to recap
that's the whole eventually class ok
so what was the point of all that you've now seen three decorator classes that
all have this and then method in each case it takes a block and somehow caused
it with information from the decorated object which could be nil or it could be
multiple values or it could be a value that arrives asynchronously all three of
these things
optional many and eventually our implementations of monads and I think
they used most useful way to think of a monad is an abstract data type like
stack collection like any abstract data type it's got some standard operations
and then is the operation we've already seen we need another operation which is
a class method called from value we haven't seen it yet but it's very simple
it just takes a value and calls the constructor to make an instance of a
monad in the right way
so this abstract away the details of exactly how to call the constructor with
a simple value which is different for each monad in addition to those
operations in order to qualify as a moment out there are some simple rules
of those operations have to follow the main rule is that and then must call a
block with a value zero or more times at some point and this is a much weaker
guaranteed and the one provided by each
it doesn't say whether the block will be called at all or how many times when
another rule is that the and then method must return an instance of the same
monad and that's what makes and then calls chainable the and then
implementation in many and eventually already explicitly makes an instance of
the same monad optional just trust the block to return one we could have
enforced that we wanted we could have put raised and less result is an
optional or something
the third rule is and then and from value don't mess with the value and that
just means that when you construct a monad with from value
you're guaranteed to get the same value out again with and then for optional
that only applies to non mill the values but you should really only call from
value with a non no value
the big benefit of monads is that they give us a common interface which allows
us to do one thing
connect together a sequence of operations
the and then method means do the next thing but the way it does the next thing
depends on which mode you use for optional
it only does the next thing if the value isn't know for many it does the next
thing many times one for each value and four eventually it only does the next
thing once the value becomes available
and as you might imagine there are plenty of other monads with different
behaviors - so like the size method for stacks or the Select method for
collections we can define new functionality that sits on top of the
common monad interface and we can use that when dealing with any monad so for
example we can write a method called within within is like sugar on top of
and then instead of expecting the block to return a monad we expected to return
a single value and then we automatically decorate that value again with from
value so that you don't have to do it in the block so because the within method
hide the implementation specific business of putting a value back inside
the Monad you can use it to write code that works with any monad for example
here's a method that takes some monad containing roar Jason representing a
github user and then it uses the within method two pars like Jason and then
assemble a string containing their real name and login all within whatever the
original monad is if we feed it an optional
Jason string we get back an optional description as the result and we know
how to get a value out of that
if we need to if we pass in an optional containing nil
it doesn't blow up because the optional monat won't let description from even
try to pause the Jason it just immediately returns know if we make them
any object containing multiple wrong jason strings then description from will
return many descriptions and we can extract the actual strings if we want to
and finally if we make an object that will eventually return some Jason by
asynchronously fetching it from github then description from returns and
eventual description and we have to run that eventual description with a success
callback to get the value out eventually
it's extremely useful to be able to write one method like this and have it
work with any monad and that's possible because of the common interface provided
by and then
and from value and the operations we can build on top of them like stacks and
collections monads are abstract concepts but their abstract in a good way
the abstractions give us power as programmers we should understand them we
shouldn't talk about monads in hushed tones they're simple and useful by
applying the power of monads wisely we can untangle nested callbacks make parts
of our code more reusable and generally make our programs better if you want to
play with the moment implementations from this talk I put them on and get her
but tom stewart / monads I've also package them up as a gem for some reason
that isn't quite clear to me
if you're interested in seeing more computer science ideas explained with
concrete ruby code I wrote a book that does that and i just found out this week
that the japanese translation is being published next week so if you speak
Japanese
that's all I wanted to say thanks very much