Monday, October 17, 2016

GoRuCo 2014 - How to Debug Anything by James Golick

ames call ok
I I O violinist
pictures things on so i know i James Hallock I everywhere online Twitter
get Hobbes Instagram freenode
my blog is James Bond dot com very easy to find online
and I work 24/7 see you can always reach me there
%ah hour for a company called
are copied hola G I we make a product called package cloud
where we do I apt yom and ruby gem repositories
as a service if you have a need for public or private
our package repositories or both I'm definitely check this out i'm talking if
you want to talk about packaging
I I would venture a guess that I'll probably get more excited
then anyone else in this room about talking about packaging
Sol
people
say this right I am somewhat of 10
programmer say this right and there's been a lot of talk on my
Twitter recently in on some blogs and stuff about whether we should stop
saying yes or
suggesting that we should stop saying this and I think it's important to
distinguish between
people say this and sorta a moment of frustration an outburst personal
frustration is what we do for a living can be very frustrating writers pressure
on you I ship something somethings broken
get frustrated you tweet something whatever and people
thrashing legitimately believe that everything is terrible right
I was gonna pull my i phone in my pocket right now but I'm not allow to happen on
stage
and show you my iphone to be like I have a super computer in my pocket that can
literally some in nearly any media that was
ever created %ah via the air
that's not terrible that's fucking awesome right by
as much as I think that you're wrong if you legitimately believe that everything
is terrible
I think that you're probably at the very least naive and possibly being a little
bit disingenuous if you don't admit
that everything is broken and I don't mean that nothing works obviously
Softworks right but the reality is the software is Bobby units flaky and it's
on reliable
on despite our our best efforts to make it better om
and this actually makes sense right
were innovating really fast we
software like software engineering is a relatively new field we're still
figuring out how to do it
om and we really just haven't caught up with the
pace and innovation and growth in our industry and you know we're still
working on it right
so it makes sense that everything's broken and
as a result in everything broke it being broken when you get engineers together
in a room
our online wanna the big topics of discussion as always we're hiring right
better car
right how do we how do we actually had we produce software that's less
unreliable or less more reliable a rather that's that's more correct that
that works better the handles edge cases better
all these different techniques for doing that right there's
you now testing obviously very popular in the roomie world static analysis
I and then you know stuff like what IMRB was talking about this morning where r
are with new languages have more sophisticated type systems that are
capable of
more accurately expressing the constraints have up different units
inside of our programs
but one thing that we don't talk about that much
or lease in my opinion that we don't talk about enough he is how do we call
where and water what are the strategies for dealing with our software when it
doesn't work
whether it's our caller or so or someone else's code I since your day on Twitter
arm if you wanna play high quality software that performs
you should expected said text box at every level
there's a very simple reason why bugs exists at every level
and sell given enough time getting enough complexity
you're going to run into those bob's and either graphics on
or they're so many broken so
over the last few years arm I fixed
a buncha bogs in a bunch of diff out a bunch of different love the stock
obscene lots of bugs in my
my own code are but bob's in the room TV and bugs and not memory allocators my
see Paul kinda places I always get this question
people asking me like law how did you how did you find that but how do you go
about finding a bob
in a code base you're unfamiliar with how do you go about finding a bob
in a language that you don't know very well and
I realize over the years that the methodology I use for debugging
is always the same doesn't matter where in the sack I'm looking
are it doesn't matter what the language is it doesn't matter whether I even know
the language really
it's always the same methodology for debugging and it's very very simple so
that's what the stock is about
every good debugging session starts with this quote this this is a mantra among
programmers
so someone major boss maybe it's the user
maybe it's a friend com CEO
and they reported the fact in something
and you pull up the called the you think is the offender
and you stare at it air like honors house is possible this can't be possible
this can't be happening you reread the calling you
you read according to read over and over again try to understand how it's
possible
you keep saying miss Wright reckon back that
cell by the way there's no room in the stock sell sorry but not sorry
arm so this is a true story about a debugging session that I engaged in
op a few years ago I'm from Toronto Canada
a friend from my hometown who was running a PHP site
I and he called me up one day he's not happy at all he had a staff
are are people who are working with them I know where they were at this time but
he called me up and he's like
hey my site is down
I am I got a call so why you just like your team to fix it
he's like while they're not here because I reasons
so he's like you know can you fix it
so I don't have a source cards I didn't know anything about the system never
seen the source code
never even really talk to any visible burns I
written in PHP I had written PHP maybe like five years ago
very little familiarity with the language I I did happen at SSH access to
his service to the had diagnosed
some other thing for him at some point %ah that was unrelated
on so
he's like acne fixin unlike I don't know I guess I can take a look
saw I S H in one of the servers arm and
Michael K walton's probably running PHP under Apache
so I'll take a look in the Apache error logs that's where I figured the PHP
error logs would be
and of course there's an nothing in there
right and it's funny because
you know you might think this is like for a worst-case scenario now that the
site is down there's nothing in Los
but the fact of the matter is that there's never anything in the logs
and even if there is something in the logs your be really lucky
it's useful I mean if the program knew I was broken
probably wouldn't be fucking broken right
saw call what now this is what I did
I knew the PHP code was probably executing in one of these apache process
is so I found a key ID for one does patty processors
then I ran a program called esterase are to attach that running program
and give me some debugging output if you're not familiar with as trace
s race is a program %ah that will give you a trace on all the system calls
the get executed by programming you attach it to you self-knowledge system
call is
system cost create the interface provide the interface between israel and
programs after the
kinda programs are probably most of us in here right most the time
are and any operating system cell system calls are are used for all kinds of
things like
writing to files are two sockets om or
allocating memory are all kinds of of has a service is essentially that
the operating system our provides user land
s frees up what looks like this well basically asked what s race does
very simple little program are is that it captures the system call information
are using a colonel API and then reconstructs
in ASCII taxed are those system calls to look like C function call so you have a
name the function
arguments in parentheses and he will sign and then whatever that that's a
small happens to return
in this case our writing to file descriptor one to standard output
from us up offer which is in this case a c-string says hi has a new line
character at the end of it
author argument to write his number bites to write for map of her to the
file descriptor
and then the return value is the number of bytes the government successfully
to that bob Corker
most the system calls that you're probably getting I
that they're probably gonna provide useful information to you when you're
debugging like a roomie program or something like that
are gonna have a really simple names like right or open
I'm or read up by if you get confused about us what a system calling is you
don't know what it's called by name summons have some like really really
obscure names that probably won't mean anything to you
om if you've never done this kinda programming for are there all documented
in section 2
the manual so it's really really easy to find them manned space to
and then the the name in the system call and most them are documented pretty well
I thought a lot of people how to use as trace I and this is usually wear a
condom falls down
so you attack as for some processing your like a rotten find bud
and then you get like and this is like it's really small small mouth bass race
out but I've seen many megabytes
as trace output for like a small number requests on a web server
I as you look at this and you're like wallet how do I do now
right but it turns out there's actually like a really really straightforward
methodology
of war finding the causes of problems in
as trace output the first step is
well for small yup always have to work backwards so work from the bottom
and and and go up the first up is to try to find where the failure is actually
being reported
so in this example this is Apache writing
a $500 HTTP response back to a socket socket number 12 presumably
om and so you know what you find here being reported that probably everything
beneath that
is not very interesting cuba's obviously the cause of the failure happened before
the failure got
our report right
and then work back up and usually if you're gonna find
the I because you're bob in the ashtray south but its Kenny relatively close
to where the air actually gets reported to the client or to wherever it's being
reported to you
cell in this case we work back up we find this feeling called open and
there's a
a file called bar slash WWW slash TV die I and %ah PHP
are that's missing and then we get a 500 error right afterwards
i sorry to form a hypothesis like maybe someone type of that file
I and they deployed bad Carter I doubt they added the code right on Sir I don't
really know how
on how that works by Tom
there's luxor a hypothesis form right as you can slowly work backwards from there
in the output
until you find something that looks like it may be the offender right
and so there's a just above that are a few you ought not to be green but
arm just before that feeling open call
others a successful open call to bar slash WWW
slash index of PHP arm
and so you can imagine you know you had dinner at the site eyepatch attempts to
load a file called index of PHP
which has a a typo including net on
and that's causing $500 that's a a sensible hypothesis for
hot what's causing this outage
then try to prove your hypothesis cell look at look and see in the
index of PHP file are we attempting is this does our hypothesis or a bar
did camp did the first test our hypothesis prove true
yes we are attempting to include something called TV tie-in Doki HP
then look to see if %uh files there is up I'll actually there maybe the
permissions were wrong
turns out that follows not there but there is a file called TV die I N C dot
PHP
cell that make sense someone there's bad back or on the server
that's what's causing the 500 error next up is to fix the Bob
and then you now feel good about yourself
cell I think that the total time
hot that it took me to fix is outed from the time that my friend Toronto called
me
until the time his I was back up once like three minutes
and I i felt pretty good about myself and that he was really impressed me was
like wow
how did you how did you do that right so Molokini
you know songs any flowers or anything but he seemed appreciative
I am I will like some flowers
by Tom
you know later that night I Sun reflecting on that debugging session I
was like why was that so effective but I never find bugs in my own code that
quickly
right like once my car something stupid like that takes me like an hour to find
or longer
right unlike searching through files try see the air
and I realize it's because when you comin to a debugging session with all
these assumptions
they usually lead you astray write your assumptions were right
you probably would have written the right code in the first place right and
then there wouldn't be a Bach
susser 0 through all of my formula for how to debug
anything which is forget everything you think you know kisses all rock
in the first row of follows from that which is to get
a third party opinion so if you don't now anything
if you're blind then you need to ask someone for help me to ask someone for
some information about what
actually happening as opposed to what you think is happening cuz
if what you thought was happening was right then there would be no bach
so in this example and and mostly in the stock I'm I'm talking about the
so-called
esterase it's very very useful extremely useful
for debugging programs on Linux it's like such a great first thing to look at
but there's a whole bunch of other ways to get third-party opinions
on this is a great diagram it's a little bit intimidating
on but a lot of the school's depending on what kinda thing you're trying to
debug
IOC and you can be very very useful and there are a lot of them are worth many
of them are worth warning if not all of them depending on
what kinda software you actually right and try to debug gonna put the slides up
online so %ah
on see you good look at this I grant you want to know I am
there's also other ways getting third-party opinions for example if you
suspect that the but might be in the operating system
finding another program that does what your program a spa studio
and running out and see if you get the same behavior I might be where
of sorta making a first step toward confirming
I that its the bug is is in the operating system rather or in another
layer rather than in your programs that can be very
useful technique is wall I'm sure there are other ones that I haven't thought of
horror
that I've never used before om
so next but i wanna talk about
I its kind available whether as a bug in in
RPM package Potter buggin in apt the other package manager
on this is this is an interesting debugging session that lasted
up way too long arm
it started when we had a customer who tried to install package lot repository
on the latest Ubuntu and I know if you can see is output by
aside many of those URL's arm are the three letters I G N which means ignore
which means that apt
couldn't find anything there the request failed something like that I think
knowing those files many these files are like critical files so that their
the package index wasn't working right as pat action ex is working on
every other version of Ubuntu every other version Debbie and we had gone
around testing on
on the latest Ubuntu yet so we're like pretty confuse
South ok after staring at our own code
completely useless Lee for a while up last race
and this is a different way invoking ashtray so that the last way that I
that that I showed in the last example I was attaching to an existing process
I with this this way if you can actually start a process underneath best race
I and get all of the output off from that process in the the output from this
was really long it was like a couple megabytes on soles lot to decipher
but using our trusty methodology I
work backwards to find a failure so this is that there this is apt
I am writing to standard output what we saw so ignore
and then this this is a local copy a package bottles like this local
RVM IP address but for three thousand rails app
I'm and then trustee which is the
the distribution and then the name of the file that
that app was failing to find so we started working back up from there
and we found this law right which looks like
I'll if you can't read it says read 6
and then a long string that says four hundred you are I failure
and then the you are eyes assigned s3 you are all that we redirect to those
impacts cloud
and then after that a new line and it says message call and bad header line
okay so it seems like s3 is
returning a 400 that's kind of odd wonder what's up there
try confirm the hypothesis and make a crawl request %ah to that same
that same exact signed as three or all and get back to 100
call so
here we have a case where what happens reporting seems disagree with what's
actually happening in
I don't have space for tennis lies we actually ran TCP down
which danse out all the network traffic from their request to confirm that app
was in fact receiving a two hundred on from from s3
saw what now wall
this is where things start to get a little bit more real and you have to
actually download the source for whatever you're trying to debug
I and try to figure out how that thing works and and where the problems
now this step is actually a lot harder then in my otherwise sound
arm knowing where to find the source for
packages that are installed on your system depending on what flavor of my
next or
whatever operating system you're using might be a little bit richer than it
sounds and I've had a lot of
our late nights and early mornings that were caused by
me thinking that i was debugging a version of this the source code that was
running on my computer but it turned out to not be
even remotely similar versions %ah
distributions like a job Red Hat for example
heavenly patch source code so the version number that is on the package
could be completely different from the actual source code
are that they compiled to build that package so like if you do an opus
open SSL version off from there there version the package version gets
installed from the
from the Red Hat on our vendor package repose
it will look vulnerable right now but it's actually not as they
maintain their own set of patches
if you're on a happy face distribution apt-get source
and then the name of the package one pack the actual source that was used to
create that exact packets on your system
%ah into your home directories this is really really useful
and it's worth knowing for whatever platform apply on
now app is like a lot of lines c plus plus
lot more lines than its not like is not conceivable that we could just
enter the source directory even if even here the best he was caught program in
the world in just find the bug you need to find
a starting point I in in that in that code especially if you're unfamiliar
with the card base and especially
if you're unfamiliar with the language cell
the key here is to locate some kinda hawk sometimes straying or sequence and
bites you think might be
on hardcoded into the source code so
in my case I was kinda guessing that this error message bad header line
was probably somewhere I in in the App Store scarred and that proved to be true
so
its containing a bunch of these translation files that a
that perform kinda like internationalization for apt solids
translated works in a much different languages and then it's also in this
c plus plus file
so this work is a little tricky especially if you don't know c plus plus
but the fact of the matter is if you can read it really matter programming I
would venture a guess
that with enough effort or with a a small amount of effort you can
understand
a few C-plus class methods on
if you stare at this for long enough in it it did take us awhile
I you'll see that it's looking for malformed headers or what it believes
are malformed hatter's
now taking one step back from there when you first read this code you like Way
apt
implement its own HTTP quiet that's kinda weird
turns out it does on and it turns out that the way that it processes headers
is a lot stricter than most other HTTP clients
probably because a lot of other HTTP clients have been made a lot more
relaxed over the years I've
you know the the protocol being so popular in lotsa
misbehaving HTTP servers in the wild but
absence so specific on that it HTTP parser is just way stricter than
everyone else
so basically with this is looking for is a header that its
that ATB headers are the name header a call in
the value of the header I and then a new line right
and this is looking for headers that have no value cell name
coal and and a new line
turns out if we look back at our crawl I
maybe you can see it there but there's a header an empty Hatter
the content-type header %ah that was causing this thing to trip up
and report this error
we fix it by actually selling content type on all 4s three resources which
is probably a good idea any way obviously call
somebody's fax:
sock roll till locate the correct source code
easier said than done I'm and you know for two points off on a platform
learn how to do that I you know it won't take that long to figure out
I and knowing knowing how to find the correct source code will will save your
ass
I guarantee identify hardcoded string
I or some way some hot that you can use
are to find a starting point in the source code off from which to work
so this is really important because if you like I remember what house debugging
in MySQL
its now two million lines of code or something like that
if you have starting point not finding anything
period
if debugging is a fine all if we think about it like
customer acquisition final I think step for
is where probably most people drop out the final you know they come to some
call this written some language the half million where
there a roomie programmer I and the cop on Cincy
or some c plus plus by the reality is
that with another effort you can learn this stuff and it's really not that much
effort
you know its its a lot a lot he's called basis especially if you follow like
a good methodology for for reading through them are not that hard to
understand particularly if you're trying to find
and understand some small you know defect in them
on you know I would I would I would encourage everyone
I to not to not to fall out the final here too
to you
to to dive into code bases in languages that they don't know
because that's how you learn lot of people ask me like
I where to start learning C or worse are learning about systems programming stuff
like that
and you know the answer is that debugging is there really really great
way
artist wanna get your feet wet in that stuff
hopefully get to the stop getting a step that's a good time to have a beer
I am fix whatever is broken so
those are my sub for how to debug anything I forget everything you now
get a third party opinion locate the correct source code
I'm identify hoc stare at the car rental make sense
maybe learn a new language in the process and then fix whatever is broken
questions