Thursday, October 06, 2016

Cascadia Ruby Conf 2012 Therapeutic Refactoring by Katrina Owen

few things drop
attention like a looming deadline
some part of my brain watches in morbid fascination as the deadline approaches
wondering whether to call emergency services or maybe just settle in make
popcorn and watch the show
I can't help but keep a wary eye on it
a wary eye that is no longer paying attention to the code at hand
I need all the weary eyes I can get without them forget best practices
i revert to less successful strategies like guessing and desperately copying
code off of stack overflow
sometimes I'm happy
there are moments when the world melts away your sense of self dissipates you
become completely absorbed in what you're doing time appears to slow down
and speed up simultaneously being awesome feels effortless
I've taken to coming into the office early in the morning and committing
random acts of refactoring some people are calling this guilt trip in
development but really it's not
refactoring this makes me happy more a factoring is an optimization
so today I'm going to tell you a story
the arrow needs to . the other way because this is a refactoring story
anyway it has a middle actually has a beginning two middles and an end and
immoral to be clear before i get started when I say refactoring
I mean small careful steps that improve the structure and the readability of the
code without changing its behavior tests are implied
so once upon a time there was an application that performed some
incredibly dirty hacks in order to deliver data straight into a number of
real-world old-school hardcopy publishing systems
I found this particular specimen in the dark recesses of that code base
it was a module that was 300 or so lines long most of which was in a single
method this module talk to coat all over the application it dealt in temporary
files
ftp it shelled out used exist to to write metadata into jpeg files that
handed handled encoding had a comment in it that read a kitten dies every time
this code is run
we had it running on cron job
yeah
it had no tests and there was no documentation
this is essentially a bucket of business logic it jump through hoops in order to
name files so that they end up where they need to be there is a comment but
it's almost as bad as the code also it's wrong
the input to the method is a target which is the godless object in this
application is an active record model that classes 500 lines long
the big picture here is that information is being shoved onto a string in order
to build up the file name and there is a bit of code that's not actually
shuffling things onto the string
this draws the eye it appears to be a chunk of something and it's probably a
good candidate for extraction
another striking aspect of the code is that it's littered with a bunch of
low-level details
makes it harder to see what's going on but amidst all the clutter
there's does seem to be things you know
chunks that could probably be named so what are we looking at
we've got a very large method
it appears to be doing things at two different levels of abstraction and
nothing at the lower level of abstraction is being named these are not
characteristics of good code
the thing to do with big ugly code is to break it apart into small ugly code
now there's a critical difference between breaking code down and breaking
code
so the first middle of this story is going to be about adding
characterization tests to the method and then the second middle is the actual
refactoring
where do you even begin testing something like this
we don't really know what the inputs look like
we certainly don't know what the output looks like we do know that it's in
production and it appears to be working because we haven't had any complaints
from the customer about it
the easiest way to discover inputs is to just send something anything really into
the method and then see come to see what comes back out
we need an assertion to work against and again it doesn't matter what that
assertion is it is going to be wrong
nice thing about being wrong is it tells you what right looks like we're running
this fails
obviously the error message tells us that we're missing an input it also
points us to the exact spot in the code where this is happening
what we find on Line six it comes as no surprise at this point where the message
publish on is being sent to target and it returns a date
meanwhile back in our tests we know that the stub represents a target and we are
ready to start flushing out those inputs and the one we need right now is
published on which we know is a date and the failure tells us what our next input
is digging around in the code reveals that X Y Z category prefix is a string
next up is kind which is also a string personal as boolean ID is an integer and
title is a string and this gives us our first real failure
meaning that we've gotten all the inputs right these are the inputs
the messages that we need to send a target in order to build up of filing
and with the inputs in place we're also given the output which looks like this
and now all that practice copying and pasting from stackoverflow comes in
handy
copy that string from the failure into the assertion and the test should pass
it doesn't know the fix for this is to use a red X and this gives us our first
passing tests now we're not quite done yet
there are a bunch of low-level details that are being exercised and there are
alternate paths through the method so we need to improve the inputs for these
three lines of code and add test cases for the lines that have conditions in
them at every step failing tests tell us how to tweak our assertions
these are trivial to fix boring to watch so i'll show you some highlights publish
on get zero padded and pie doesn't actually get affected by this so will
fall back on
II kind ji sub's out underscores
we need an underscore so we can see it being removed title now title appears to
strip out everything and the existing input doesn't have very much craft in it
so we need some numbers and some funky characters and a couple of uppercase
letters
there's something fishy about that rejects
what's with the square brackets i'm too lazy to reason about this so I'm adding
a test case overriding the stubs input for title giving it some square brackets
and that proves that brackets get left in place is probably not the intended
behavior
so this test now serves as documentation until we can clarify this with a
customer
the first conditional deals in personalization and our stuff doesn't
personalized so we need a case for the test a test for the case that actually
does were informed about yet another input that's missing is trivial to
supply and the other conditional on that line of code provides a fallback in case
ages mill and the test for this stuff out the same inputs is the previously
the final conditional is a ternary statement that is trying to determine
where to truncate the title and our default input for the title is neither
longer short so by making it longer we can make sure that it gets truncated and
then the only take case left to test for is a title that is too short to get
truncated and at this point we have protection against regressions
so what do we actually accomplished so far
we took a piece of undocumented untested code and with a bit of hand waving
we got fake assertions to give us the inputs the inputs gave us the outputs
and the outputs gave us the real assertions almost comic then we have to
reason about the code we inspected every line made sure that we have inputs that
would actually exercise that um exercise those lines and then we make sure that
every branch of every conditional was called from a test
we're not there is no specification here
there's no design being done we're getting a regression test suite
we are also getting documentation of to and including the fact that we probably
have a bug
the biggest win at this point is that we can start changing things without
breaking into a cold sweat
so here's what we're going to do we're going to isolate the method and then
we're going to extract a bunch of smaller methods
I'm going to borrow detailed description for Martin father's book refactoring
called replace method with method object
traditionally you do this for factoring when you have a big computation in a
bunch of temporary variables and you don't want to be passing those temporary
variables around as you extract method methods now here we are pretty much only
have this this one target to worry about
so first of all we need to make a home for this code
add an initializer that takes the target add an attribute for it and then just
copy paste the whole thing in there
the method shouldn't be called on the class
it needs a better name and it no longer takes any arguments
back in the old module we need to reference the new method from the old
one just delete the whole body of the method reference the new file
instantiate the class pass in the target call name on it and green which means
that we have a license to go to town on this code
where again you could start anywhere
I always like to delete something so let's go ahead and get rid of that
comment and now somewhat arbitrarily I've chosen to go from biggest to
smallest and the biggest strengths that we've seen for extraction is the
truncated title bit
the actual mechanics of a method extraction is as follows create an empty
method and name it by what it does or how or what it is not by how it does it
coming up with a good name is often the hardest thing you'll ever do in a
refactoring the next step is to copy the lines of code from the source method
into the new one
scan the new method for local variables file name here is the Clark flared in
the source method and so we have a choice to make
we can either do the mutation here which means passing the final men are making
it in this instance variable or we could make this a query method
no the whole point of the refactoring is to separate the two levels of
abstraction building up the file name in one place and putting together all the
little pieces in another
so we're going to just return the value that we need the other temporary
variables here are all local for this method we need to assign the result of
our nuclear method and delete the temporary variables that are no longer
used in the source method and then we're going again before we move on I'd like
to tidy this up a bit
there is some unnecessary work going on in the regex we're doing a
case-insensitive match and then we're down casing afterwards
that's kind of backwards
there's also something in the match of brackets that drives me nuts
the spirits parentheses got to go
another thing that bugs me is the fact that we have a ternary statement to
decide the range for the matches
so a match if is not going to raise an exception if you try to truncate for
character string to nine characters it will just return for characters so turn
a statement can go and since we no longer need truncate - we don't need to
worry about the length of the title so that can go to and with those two line
got lines gone there seems very little .
and having a temporary variable so we can ditch that as well effectively
leaving us with a single line of code in truncated title the longest line of code
is now the hex digest stuff now x digest is a terrible name in this context
copy stuff in assign the result and green again
personalization line is now the longest perform a quick extraction application
day and XYZ category perfect snow inside this class X Y Z is redundant the fact
that it's a perfect is irrelevant
so we can drop the XYZ prefix as well as the perfect suffix and just called the
new method category now we've got kind
so this leaves us with a fairly readable method we can totally leave it at this
but there's still some serious stuff in here to publish publication day for
instance the location they returns a string
so why are we interpolating it
same goes for category and kind
now target ID is an integer but string interpolation implicitly calls to string
on it then this next move might seem a little bit settle until you recognize
that kids game five differences
so which line of code here is different from all of the other pieces
the first line was the only one that's not being shoveled onto the string so we
can start off with an empty string and make the first piece just like all the
others
up to a point the extension is also different if you come to conceptually
separate this into two jobs building up the file name and then slapping the
extension on it we're left with two distinct sections one where everything
is smooshed together and another where the pieces are separated by underscores
if you extract this much together pieces then all the remaining pieces are
separated by an underscore
we can now shovel them into an array rather than us
ring and this gets rid of a bunch more of the interpolation syntax joining with
an underscore done
is it perfect of course not
is it better hell yeah we went from this to this in less than thirty steps
let me just show you that again before after before
so first we quarantine the method then we extracted stuff
extracting stuff basically boils down to identifying a piece of code that
performs a subtask and then giving that code name back when we started out we
identified three code smells a large method to different levels of
abstraction unnamed abstractions and extracting methods address all three of
those issues at that point we still had some ragged edges though so we kind of
picked through it and removed pointless crushed
now this concludes the second middle of the refactoring story and i'm going to
end by taking a closer look at . was craft in the field of vision information
design chart junk refers to visual elements and charts and graphs that add
noise they they don't help you understand the data they often get in
the way of comprehension give you an idea of what we're talking about this
graph gets just about everything wrong
this is the exact same data presented without charging so-called junk is the
Ruby equivalent of chart junk
this is not about coding practices it's not about naming an srp and small
methods and dry
it's about noise so i tried to classify code junk and I came up with this list
number 10
does a bear shit in the woods
comments shouldn't echo the implementation
they shouldn't be wrong
they shouldn't be in precise and they shouldn't be misspelled nine
it is my conviction that everyone should make their editor show them trailing
whitespace you leave it in you have noise in your code if you then take it
out you have noise in your gifts which is just wrong on so many levels
aight if it's dead let it rest in peace
what were you saving it for
you need to learn to let go
ok what is this doing in source control 760
no not six more parentheses ok this is vile
when i see this i secretly wonder if you wash your hands after you go to the
bathroom
six intelligent default
yes for character state characters saved our four characters that you can spend
another day or something
let's talk about dependencies in particular dependencies that you're not
actually depending on this may or may not impact performance
I probably will impact the performance of your tests it's a whole different
story
mostly you just add an oyster code for this is familiar territory recognize
this not to beat a dead horse but you three don't do work that the computer is
already doing for you
more string interpolation I could have technically folded this in the previous
one but I'm trying to get to 10
the same thing the martin the carlton map is superfluous here
the match of brackets don't need your help we've seen that one
and in this example the computer it computer isn't doing the work for you
but should be to test Suites they should be curated maintain fed watered coddled
and minimized the only exception i can think of is if documenting behavior
outweighs the cost
choose wisely and finally the compounding season
you know the old saying one plus one equals three four very large values of
one
this this is revolting
in fact if there's a horrifying resemblance the code that we started out
with today when people say like software has grown not built i'm pretty sure that
this is what they're talking about
it's kinda like fungus so earlier I said about the refactoring that after we
extracted methods we kind of polished up the code a little going to restate that
in light of the previous section
first we extracted methods then we eliminated code junk
alright so that was the ending of the refactoring story before i get to the
moral i'd like to mention the whole thing lives on github
it was committed step by step so if you want to look at any of the mechanics or
run the tests or anything just dig through the commit history it's all
there
I have a vague recollection of the night when that code was written
let me rephrase that I vaguely recall writing it
yeah when I panic my god awful code other people do yoga or meditate or go
whitewater rafting every factor
it soothes me now
science if you believe that sort of thing actually explains why the key
player in the game is working memory working memory the thing that allows you
to do mental arithmetic you store temporary results I guess variables off
in memory as you work through other parts of the problem
the main differentiator in performance of tasks that rely on working memory is
the type of strategy you're able to apply to a problem
so if you a better working memory you can apply more complex more successful
strategies
here's the deal when you were in ways that involve mental chatter you use of
available slots in your working memory
you no longer have the swap space necessary to use complex strategies and
you revert to less successful ones like copying stuff of off of stack overflow
this is where refactoring comes in
refactoring makes you smarter refactoring basically gives you an extra
brain so you offload a bunch of those little details that under normal
circumstances going to working memory into your tests
once you start refactoring you start reclaiming your brain it actively
counteract panic when the really unexpected things that happened when i
started refactoring simply for the act of refactoring itself was that i started
optimizing for happiness I would specifically put something into its own
class just that I could load just that class and nothing else
the feedback loop you get when you have sub second test suites for the thing
that you're working on is
unbelievable huge enabler for Flo Flo is nice flow is more than nice it's like a
bliss cocktail selfishly isolating code to improve my own happiness turned out
to be a pretty good deal for the code as well
suddenly I was avoiding dependencies I was constantly asking myself
can this be a separate class how about a gem
what is the unrelated sub problem here made a dramatic difference in summary
refactoring makes you smarter by giving you an XO brain it is preventive and
curative with respect to panic and if you optimize for developer happiness by
making your tests really fast you get loosely coupled code that adheres to the
single responsibility principle
thank you