Tuesday, September 27, 2016

Debugging with the Scientific Method - Stuart Halloway

alright testing two different mics here we are good
alright so I guess I should make a couple of clarifications on the
introduction before we began I'm pretty sure that Alex just to attain of
causality whereby I can say that I'm responsible for strangely so I'm pretty
excited to have discovered that and can't wait to see my checks in the mail
the second thing that he implied a little bit was that there's a lot of
opportunity to watch debugging happening on the product team if you work at
convaTec in actually you know to Tomic is really no bugs at all so I don't know
I don't know what he's talking about their either but seriously you know in a
closure audience I really want to begin this talk with a rationale this is
something that rich has instilled as a value in this community and so I put a
lot of effort in having a good rationale or trying to and i dont have actually a
multi-part rationale for this talk of the first one is debugging why should we
care about debugging and I think there are there's an obvious reason because
bugs right we have bugs that is a you know there's some estimates that that
have it that 50% or more of the cost of software development is actually
tracking down the bugs later some of us you know working environments where we
get to run away and somebody else has to do that so depending on what you do you
might not always feel that but if you own something for awhile you come to
realize that that's pretty obvious the second thing is that the body is is
really quite straightforward activity and so you know Alex made some sort of
some sort of positive adjectives to describe my amazing awesomeness at this
or whatever it was that he said and the fact is this is not hard to do but
people think it's hard to do and so you can look disproportionately smart by
doing it and I am continually intimidated by intellect and the
achievements of the people in this room
every time I talk to somebody and they have some new area of knowledge that I
didn't even know existed about which they are expert and so to be able to
have something where without the expenditure of neurons you can appear to
be very bright is is a valuable thing to have in the final thing in this is quite
subtle and I will call this out this much in the talk I mean it's really
probably probably have about eight hours worth of material corked up right now
but there's a there's a a follow-on strand which might lead to an
interesting conversation which is that understanding the way that I and by
proxy rich think about debugging might help you understand some of the
aesthetic that goes into the design of closure and the process in the evolution
of Crow closure has come to my attention that sometimes people out in the world
have more than one opinion about how to do things in the closure community more
than one opinion about how to do the Getting Started experience more than one
opinion about how to do documentation more than one opinion about how to
handle exceptions we have lots of different ways of doing these things and
I'm not going to lay out a here's one way to do it
rights fantastic that there are multiple different ways but I think that the
scientific approach to debugging we knew it when you adopt it
it influences what kinds of tools and practices you're gonna reach for in
other areas and so that's very worthwhile of follow-on conversation
tonight those who continue on into office hours
office hours will be held at you know whatever local bars stay open the latest
second thing as a rationale for why have we talked about this a couple of reasons
I'm a closure screener I am also an enclosure has a process which is not
identical to every other get her project out there as you may have noticed and so
I in fact push all of the commits and have for many releases now I read every
line of code that goes I read every line of code to closure since the 10 release
and subsea know a lot of things I've seen many more things didn't go into
closure for various reasons and i've seen
things that were bugs and things that were thought to be bugs and not books I
also typically AM the the place of last resort and atomic support that doesn't
mean that you're on fire if you calling you actually I entered into the phone
right we're just rotating around but but I'm often the backstop on the heart
problems and so that is giving me an opportunity to own the system for a long
period of time and to observe bugs that we've had and bugs and other people have
found that we had and bugs in their own systems that we've ended up helping them
out with because of the support relationship also I'm gonna make a
little bit of appeal to authority here iron this gray hair I've been doing this
for a long time and and my hair was not great at all when I first started
debugging so
I'm pretty sure that the most of this is directly related somehow the alternate
theories that has something to do with children and finally I am really lazy
and not the most terribly lazy person in the world because you have to be
competitive right to be that lazy but I really
I really don't want to work hard and so I want to have an approach to solving
problems that allows me on most days to close up my laptop at five o'clock and
spend time with my children and have a tasty dinner and a glass of wine
whatever watch whatever HBO's most violent show and are currently is on TV
go to bed so I'm lazy and you can benefit from you know programmers were
all about laziness we're all about finding tricks that will let us do
things easier ways and so I think these all contribute then the question is ok
we talked about why debugging and why a maybe I should talk about it why should
we talk about the closure context will Karen Meyer my co-workers pointed me at
this thread on Reddit I believe in the text of this doesn't really matter but
but you can read it as you know Bloody Bloody Bloody Bloody blackberry messages
Bloody Bloody Bloody Bloody blob and it's basically the sort of beginner
experience of closure a lot of people when they are new to closure
particularly struggle with debugging the product problems figuring out what's
wrong when something doesn't work and so this long thread on Reddit where people
talk about the problem of this was actually the the post that launches it
and get it sorted captures you know some of the tone and a lot of times people do
things like well had a problem and i got a cryptic error message or I was
interacting with a dependency or lining in plugin or something I didn't
understand even where to start looking for my problem or I had a air quotes
type air in my system that we haven't really found if I was working in Java
and I now feel flummoxed and so there are a lot of people whose experience of
debugging closure is is fraught unpleasant in my experience has not been
like that and I'm not saying that there's they are wrong and I'm right but
I'm saying that you know
I might have some ideas that could make it pleasant for others and so that's
kind of an objective here now before I go further
how many people in here show of hands have helped to develop tooling and the
closure ecosystem worked on any kind of tool so lot of interesting tools in the
closed ecosystem there a lot of interesting tools about editing about
debugging about visualization of data about visualization of flow people got a
lot of different things and this talk is not about that and I want to make it
clear up front that this talk is not guarantee that I have a huge proponent
of using tools that being said if you do a Google search for the buggy and I did
this you know the vast majority of the hits are about tools my overwhelming
well more than half of the Google has all the resources out there are about
tools and so I don't think you'd be getting your sort of Kindle a keynote
value-add if I told you how to use tools cause there's a lot of good advice out
there and the thing is about tools and interesting tools is not directed by
tools right there is no TDD out there the stands for tool directed development
we will the tools they don't wield us and so when I think about debugging I
want to have a set of ideas in my head that give me direction or as that great
philosopher of Computer Science Yogi Berra said if you don't know where
you're going you might end up somewhere else
this is one of the biggest problems when people start to value system right one
for my for my spiritual support perspective you know what you discover
the first thing that happens and you start trying to help somebody with a bug
the first thing you find yourself wanting to do it like I really want to
go back in time to when you first thought there was something wrong before
you started doing anything I want to go back to that point where you started
acting before you had a plan before you knew where you want to go
you know it's so often the case that and I'm an advocate of being a self-starter
right so by all means try things but have a plan and then finally why would
we talk about the scientific method you know I had considered for a while that
the approach that I used to debugging was a scientific method but had not you
know read the literature on the scientific method you know past having
had a science degree in college and I had not read what people had to say
specifically about the scientific method so of course I turned to Google to find
out why should care about the scientific method and what I discovered is that the
scientific method apparently is because rainbows
which I have no idea why I'm pretty sure that all the sort of conveying the
Edward Tufte people out there are just no pulling their hair out what are these
colors even mean
and so rather than use one of the pre-built graphics with a meaningless
come colors have made my own scientific method for debugging graphic it's not
that different from any of these except got all the colors removed so because
it's also sort of the limit of my ability to manipulate the gravel so what
does debugging all about will debugging typically starts with the failure so
failure and by the way in contradiction all the terms i'm introducing I'm gonna
give you something from the etymology dictionary on them advice to future you
know
proposal submitters if you use two or three words that are slightly unfamiliar
and make reference to an apology and your abstract clearly clearly that's
something that that is valued in the space so it is a failure a failure as a
lack of success or better it's an omission of expected action expected
something and i got something else now as a result of this failure hypothesis
and hypothesis is an explanation that's made without complete evidence right
without necessarily enough evidence to answer the question actually maybe you
don't even know whether you have enough evidence to answer the question maybe
you do but you haven't sort of worked through the process enough and acts as a
starting point for further investigation then given that hypothesis you perform
an experiment which is a test on a trial and after that experiment you make some
observations when observation is active acquisition of information from a
primary source now in a good experiment 1 of 2
equally good things can happen you can discover that the observation falsified
your hypothesis so falsification is a deductive process using Pakistan as you
say I'm hypothesis implies there something I should not see there's some
observation of that I shouldn't see I run experiment I see oh well if I am I
miss my thinking has been rigorous and hypothesis as if this is true
than not I should happen ICO my hypothesis is now dead and that's great
I know it's disappointing if you're trying to get grant money in the kind of
regular everyday science but in debugging this is good news right we've
now I killed a possibility that's good
the other possibility is that the hypothesis is supported by the
experiment that maybe you still don't have an answer that sufficient to let
you say I've isolated at which point you need to do refinement and refinement is
removing impurities or unwanted elements right subtractive so I have some sort of
story here now my story may have to get bigger to remove elements are not
actually saying the number of words in your hypothesis is going to go up or
down but the conceptual size it is going to be focusing in on whatever the actual
problem is and then you know eventually and I'm not gonna promises happens after
two or five iterations but eventually you have a hypothesis and hypothesis is
something that offers I'm sorry theory is a hypothesis that offers
has been validated by predictions now this does not mean theory is guaranteed
to be true it does mean though that it is true given all of the information we
have so far right so this is another place where people go wrong and I see
this all the time
right you know I have ninety nine pieces of evidence in favor of this is where
the budget and I have one little piece of evidence that flat-out contradicts it
and I don't understand that piece of evidence and so on 99th pretty good I'm
just gonna keep going right if you have that one little piece of evidence that
says you're dead you're dead right you can't end and we get attached to our
theories get emotional about you like you know what I really like that theory
and look at 99 good pieces of evidence in favor of it can't do that now it
turns out and sort of I get dragged into the history of this you know researching
the you know what other people have said about this for the stock it turns out
that there are all sorts of objections to the notion that the scientific method
is how science actually gets done some objections are superficial
or idiotic perhaps but there are there are more serious concerns as well so one
of the things that I studied in graduate school is Thomas Kuhn's the Structure of
Scientific and sort of there's all kinds of pity ways I could unfairly summarize
this but it essentially says that that science is a little bit better than
politics or religion in that you wait for people whose ideas are gonna die out
to die instead of actively going and killing them right but it basically it
basically paints a pretty unpleasant picture that science is an extremely
social process right that how could it not be rites performed by people and
then once you have this notion that scientists a social process it's opened
up to what are the social priorities behind science and you get all these
moral challenges to science so these hot-button things you know they change
from decade to decade one of the hot ones when I was in academia was the bell
curve right this notion that we're going to use science in air quotes to sort of
split the human race up into different groups and say you know these groups are
are more intelligent in these groups are less intelligent that's not the case
certainly not justified by the evidence but these challenges are real and here's
the funny thing right here's the good news right there's all kinds of stuff
that I just said that we could have very heated debates about and get angry about
it in the context of debugging we don't have to worry about anything any of that
stuff it turns out that debugging if the scientific method is the measure of how
science is done
debugging is actually more like science than signs
science is hard it's difficult to imagine how you can take that little
process I just have in turn a crank on the side of it and come out with a
theory of gravity for the theory of evolution right of any of the other most
important theories that people have ever funk but that's not what we're doing
right that is not what we're doing we're doing something that is far more
constrained right and debugging is deductive and inductive we're not trying
to come up with a grand theory of everything right we have a grand theory
of everything it's called the proper system we're running we're trying to use
that to prove some very concrete thing right this thing happen also there's not
a big political problem right there is not usually I mean in my experience
debugging in and of itself has not been politically fraught right there are
people who are experiencing bugs and they want to see them fixed and as the
developers we want to see them fix so we're not having these kinds of
arguments and there's not generally a lot of moral outrage right and there's
not any kind of like academic left saying you know there is no real reality
so you can't actually you know you can actually prove that there was a book
here right there's there's none of that so I mean seriously this is really good
news so so unfortunately the scientific method may not be all that great for
science but absolutely rocks for debugging now having said that we can
drill in a little bit and we can start to use some more specific terms that are
more about troubleshooting and debugging gonna replace the word hypothesis with
cause we're looking for a cause and the definition of a cause is an event
proceeding in effect without wish that effect would not have occurred that
sounds awesome right if I knew the cause of the problem then you know I could
eliminate the effect of not being so I caused strangely right as we said
earlier so you know so the problem with causes that are not sufficient right
cause the universe exists right if the universe didn't exist this would happen
that's a cause but not very interesting 'cause and it's not one that software
we're not in a position to do anything about so you wanna get from a cause to
an actual cause actual cause is the difference between the actual world
and the closest possible world in which the effect is not occur and week I mean
we could go down the rattle of getting regrets about closeness I'm not gonna do
that in the stock but we can be intuitive about closeness right if you
have you know the glamorization of five things that all contribute and with four
of them it doesn't happen and one of them it does then that one is closer to
being actual cause that's what the paring down process does we want to
start with an idea about a cause and we want to narrow down to an actual cause
and then great word in this context a fix is just the last experiment right
the first experiment is reproducing the bug and then there are a bunch of
experiments that are about hypotheses and a fix is the last experiment this
captures a really important idea which is if you have it fixed it and run an
experiment that shows its fixed you haven't really identified the bug get
you may be strongly suspicious that you've identified the bug but you
haven't actually identify the body to you fixed so I'm gonna take this
scientific method and I'm going to apply it really slapdash way to problem and
I'm gonna start I'm gonna start with a problem that I grabbed off the stack
overflow this is a very tiny debugging problem you can probably do it in your
heads so sure to use a question on Stack Overflow why is this partial not working
so I defined partial join as the partial of closure strings like join with comma
then I called partial joined on fubar and I get back the scary error message
class cast exception cannot cast job cuts bring to closure lined up I fun and
it's a scary number in it so there's no way we can possibly figure out what the
problem is with this terrible error message
so what do we do when we have a lot of choices here we can make error messages
better right we could have an error messages since I see you were trying to
make a partial withstood join and you know that has this exact scenario we can
anticipate every possible scenario anybody could ever gonna do and not to
be unfair as we could go a long way toward having better error message was
not being error messages can be made better sound like a good plan to meet
you get a better docks into this problem the documentation we could use a
debugger intact in this case it might be even that syntax highlighting per in
highlighting would have tipped you off as to where the problem was I didn't
give you that may be static typing would've helped I'm typing would have
helped you can imagine a scenario where static typing will solve this problem
using some sort of schema schema validation would have helped and in fact
this one so easy that you might have stared at it and just know what the
problem was so all of those things are useful and all of those things should be
part of your talking except for maybe staring at it like that was actually
really quite we can come back to that but the important thing to realize here
is that science is more general purpose and requires less on hand to do than any
of these other things
say that again because it's so important but scientists more general the
scientific method here not science the scientific method where we can even be
academics a hypothetical did activism
let's just stick with scientific method the scientific method
is better sauce then these other things because it's more general because it
doesn't require anything to exist in the world except your brain so let's try
that when you go down this road and I said in a release slapdash way my
hypothesis is going to be will look there's only three things here why is
this partial not working well join doesn't do what I expected to the very
fuzzy hypothesis were partial doesn't do what I expected or death doesn't do what
I expected right that's it has to be one of those three things those are the only
three things they are two very small problem to to check and so the
experimental approach to this I will propose a heuristic for small problems
like this which is you should do a bottom-up check from the rubble right
pick the form at the bottom
that's on the inside and check that in this case we're done right I checked the
very bottom form here the bottom line was closed strings like to join comma
which returns comma that seems like almost certainly wrong because if it was
just going to return com I could have just passed in common to begin with and
then when I look at the next one partial join if we if we substitute in the
result of that right in the next in the original form we're now partially
overcome which is using common as a function so by applying the snow I got
lucky this time I might have had to do three little things that the report to
figure this out and I got lucky and I only had to do one so weak science is
stronger than strong tools in this case at a poor problem statement I really
never got clarity on my problem statement I had really poor hypothesis
which one of my hypothesis was that was out that was out there that was out but
those are not very specific experiments didn't even really have stated goals
right I just sort of took this heuristic approach I didn't have much to my
knowledge right
hypothesize that
beginning closure programme doing this didn't have to have much to me knowledge
to figure it out so of course at this point I have stacked the deck in favor
of the scientific method by picking a trivial problem if you promise matter
what you do
anyone who approaches work so hardly matters so now we need to talk about
heart problems and what it would be like to the scientific method well so what do
we need we need to do it steps way better than we just did you clear
problem statements we need efficient hypotheses we need good experiments
useful observations we need to write things down I'll start with problem
statements right the antithesis of a good problem statement it didn't work
right it didn't work is hiding everything behind pronouns let's not get
it right what you want the steps you took what you expected and what actually
happened just like right there that's 100 bucks
checks but this is not a hard thing to do this is not harder then and in fact
sometimes saying it's actually caused you to realize what the problem was
going from the exercise of verbalizing it didn't work from the exercise of
verbalizing you know I stepped away the car and started to old I didn't put it
whatever so so problem statements you know there's a lot as I said I V hours
of material there's a lot we could say about problems with this is all you're
gonna get right now but this is an order of magnitude better than it didn't work
so start with this the next thing you want to do is to
so I want
and I tell you
I mean it's it's Casey Affleck but it's the Malloy twins oceans 11 and there's
this great scene where they're playing 20 questions and he they're playing 20
questions and he's like am i alive yes my person yes Evel Knievel
and and it's a joke on an idea that we all know it's like the opposite of what
you want to do it so that's what my five-year-old does we're playing 20
questions he's like ok we're gonna play 20 questions he looks at me those are
you might oak tree
we all know we all know what we need to do here we're forming hypotheses are you
want to form a hypothesis that ideally carves the world in half and so it turns
out that the naming around this is contested right most of the time you
hear people say this is divide and conquer but if you ask a an algorithm
for something that's actually not it because in divide and conquer you go
down both branches and so they want to call it decreasing conquer which i think
kind of sucks because because it doesn't count me in decreasing conquer could be
a linear-time thing right
decreased by one every time and the important characteristic here is
proportional reduction right I want to take the space of where the problem is
and I want to reduce it by ideally half but as you know if you done any
algorithm analysis right if I can get rid of 10% of the possibilities on every
step I'm gonna have the answer really quickly but that's only going to differ
from getting rid of half the possibilities by constant factor the
number steps that have to do so the question is then how do you take the
space of i don't know what was wrong and turn it into you know something that
cuts the world in half
well this is going to require to my knowledge there's no two ways about it
having said that it does imply that you should be super cautious on your initial
step if you don't have much to me know if you don't have much to me know if
it's possible that your initial step leaves out of the entire universe of
possibilities what the actual problem was and so having said that everybody
knows where to look for their bugs like if I showed your application stack it
something just stopped working but how many of you guys are gonna guess
how often I just isn't right how often is it physics its
it's pretty rare I sometimes it is and that's exciting and I have to say that
on every open source project I've ever worked on you know you get bug reports
and some of them are bugs and you fix them and some of them are not bugs and
some of them of drugs into something underneath you right we've never had a
closure bug report that we had to forward on the physics right haven't
been any of those very few even have to do with you know GBM much less OS
operation and you can make other stocks like this there's a there's a whole
conversation we could have about sort of developing a notion of what the possible
spaces and you know where you're going to but my point is you don't have to be
that good at it anyway as long as you can get rid of some proportion of the
possible causes with your experiment you're gonna quickly find the answer now
what is a good experiment a good experiments reproducible
you start by reproducing the bug it's driven by a hypothesis people say
they're experimenting with they're just trying shit right that's not hypothesis
you have an idea this is the case and then you have an experiment that
provides more information to help you find that idea also experiments are
small and when you are when you're making changes to the system change one
thing at a time because if you change two things that you have to go back to
figure out what the impact that was when something changes are you know you
haven't actually gained information so I give you a quiz on this if you have a
bunk let's say you want to report a bug in your own app and you know you're
heading off to another member of your team to help you look at it which of the
following things should not be in your report ace your test cursive prismatic
schema mage potemkin in trouble lining in line with plugins court type test
general unless you're actually unless you think the bug in one of these things
with one of these it really stands out as well you really wouldn't have that
report case and the answer is it's a trick question you don't want any of
these in your case unless your theory is that you know the shopping cart on my
system
doesn't work when I'm developing inside cursive and using closure . test the new
repro k shouldn't have anything to say about cursive enclosure . test and Colin
notable will thank you for that
about that kind of thing either so it's incredibly important to remove things
that do not contribute to your hypothesis statement and it's a freebie
when I was saying earlier that you have to have this like mental model of the
universe to allow you to narrow down the things that aren't it right this is a
freebie right in your bug these things are not it so start by taking about make
a really tiny thing that shows the problem now when you're making
observations what's that all about one thing you need to do is you understand
all the outputs of your system as outputs
45 outputs that you understand and then as one that's unrelated to your current
problem and you don't understand it breaks screech you don't understand how
do you know the trade your problem not so you'd understand the OutputStream
system and you need to be suspicious of correlations where's the code in the
last the bugs in the last five lines of code you wrote quite often and so if
anything correlates with the proper the failure appearing when you wanna suspect
that in order to make up good tools you need debuggers you need logging metrics
all the kinds of things that lets you they basically all those things give you
more outlets I think turn things that are black box into things that are white
box and it's amazing that we have this inversion that when you do a Google
search for debugging more than fifty percent of it is about just this once I
believe when some point it's important right you need to have tools and in fact
there are more things tools can do just sit and talk about those later so I'll
give you another example in and while I was writing a stalker had several things
that happened to me in the course of the week so this is what happened last week
was working on
closure AB and I got this error message in the log was completely unrelated to
my problem right my problem had nothing to log back is the configuration file
for log back which is one of the Java logging things which were also happy
that job did such a good job with and get this message and i'm having this
cryptic problem in a subsystem that I'm working on and I know this can't be yet
cuz it's just has to do with logging said that actually this was it turns out
that after you know two minutes looking the problem said you know it before I
think about this problem and go run down what this log back things about cuz I
want this to be clean well log back across multiple times on the classpath
because some library because of a build problem had been copied into a lib
directory twice at two different version points so I had food at bar version 2.1
and food at bar version 2.2 now with the JVM helpfully doesn't it without further
configuration is if those things have overlapping but not entirely union set
of named things and you can get some of them from one end to the other and
they're not compatible with each other and it can lead to all manner of areas
that are absolutely cryptic which is exactly what happened but by tracking
this down I was able to you know what I saved myself all that time and it would
have been even using the scientific method and trying to sort of by six
towards a problem it would have taken a while from the symptom to come up with a
better cause to investigate the one that was standing in front of my face
write things down this is the single most important piece of advice I'm gonna
give the stock write things down right the problem statement down don't say it
right down right you're right every hypothesis down right what the
experiment which show right why did spirit makes sense right a justification
for the experiment before you run it and write down your observations and this is
something that we all know intuitively consider the game mastermind so in the
game master mind that the colored pegs down at the bottom
represent a code that the players trying to guess and the colored pegs across the
top
represent past guesses and then the red little red and white pegs represent
information feedback that you've gotten your past guesses so if you view this as
a scientific method right you have a series of hypotheses about what the
colors are and you're getting feedback from an experiment which is the human
player is scoring your guesses we don't have to talk about the exact scoring if
you can't remember how mastermind works that doesn't matter the thing that's
dominate here is that doing your experience with right without letting
things down is like paint playing mastermind like this you're saying that
you're gonna keep in your head the entire state space of all the previous
things you've tried and even the one try you're currently on your gonna hold in
your head so she can barely see one of those through my career and the thing is
that not writing it down right you might think that of the seven deadly sins that
slot but actually not humorous it's amazingly arrogant not to write things
down basically what you're saying is I'm such a bad assets solving problems that
i'm gonna solve the problem of keeping track of everything in my head just to
show off when that's actually probably harder problem in the debugging the
people do this all the time
staggering ok everything I've said so far probably could more or less apply to
the use of scientific method in a lot of different domains and because we are
here at a software closure conference and maybe some software specific advice
and the first piece of software specific advice
is something I was reminded of as I was reading through the literature on
debugging and what programs are said about the bug in the past and I usually
don't say mean things like this is gonna come up I mean I can't help it
don't you see the history of debugging is just riddled with all my god this
thing has so many pitfalls and traps in it and it's not that C is bad you have a
separate conversation about it is that this is not a level of abstraction you
need to be working it i mean and sometimes it is right and if you have a
domain that like it would be impossible to do this in Java foreclosure or what
Python whatever then you can you see before you shouldn't reach for that as
the default or and 2015 in most Americans second piece of software
specific advice
the failure is not the defect the failure is not the defect in the way
this comes up the most often in software is assuming that the exception has
anything to do with the actual problem and it does have something to do with it
that's how you know right but that it's in some way directly translatable into
the problem so give you an example
system that had a large very high CPU utilization and then
IllegalStateException Hornick you so your first guest is no exception you
must be broken
well let me just tell you when accused not broken when he was a pretty awesome
piece of software I've had fantastic interactions determine his head from the
zero point zero days I've had fantastic interactions with the morning you team
and the only time that working with those people that we never got to the
point where we track the bug down to the point where it was underneath atomic in
Wanaque it actually was underneath 21 St which is pretty unusual by the way don't
find those those Camino Real this is a show-stopping ninety-plus SSL doesn't
work in the scenario bug so pretty darn unusual so this was not this bug and
I'll just tell you and give you a little bit unfair means information the answer
was not its wernicke you and there's another important philosopher
of debugging who can really help us out here in a couple of steps and that his
house so how says it's never lupus well in closure programming on the GVM we
have a kind of anti lupus right we have the thing that it always is but it
doesn't look like it at first and that thing that it always is garbage
collection right the actual book is always garbage collection just as I just
isn't house it's never lupus enclosure in Java it's always garbage collection
and there are several reasons for that one is most applications are not
designed to deliberately induced go out of memory errors because they're not
designed to deliberately induced those those compassionate check very much so
you're in kind of uncharted territory the second one is out of memory can
happen anywhere so there's no line of code where you can go look there's the
critical section where memory doesn't possibly run out and you can't do that
also an American appears almost any other exception because one something
failed to allocate over here you can get a cascading series of reactions when the
actual exceptions reported back is radically different finally finally
finally when you get close to have memory you get a radical change in
thread scheduling as the GCC threat is running all the time and all of the
sudden things are happening in orders that they never happened in everyday
life and so all those race conditions that would normally take your system two
thousand years to expose the accident all the sudden it takes them two
thousand seconds to expose by accident so lucky you thanks garbage collection
for helping us find those conditions and finally out of memory related problems
tend to cascade so always be suspicious of memory when you're having problems in
a garbage collected environment now another piece of software specific
advice is to read the entire fracking manual
and so the thing is that a book almost by definition starts out as an unknown
unknown
a few more about it you'd already well on the way to fix it you don't know
what's wrong
so you don't know which part of the manual you need to read and this means
that if you want to set of docs that are good for debuggers that set of docs
ideally is short and specification like now this goes against other objectives
right short docs may be very difficult to consume they may not be very
narrative they may not be very tutorial but they are a good place to say you
know what I have to read the whole thing I would like for it to be fifty pages
long and not repeat itself if I want to learn in a more neutral kind of
environment I'd like her to be a thousand pages long and repeat itself in
various ways and anticipate problems that people have every day so good docs
for debuggers are specs and we'll go back to the partial problem again so if
instead of doing experiments we had chosen to read the docs here we could
put up the doc string for join and we've seen that join us to air at ease and the
first guarantee takes a collection and then it returns a string of all the
elements in that collection so one step + experiment would identify this problem
one step + reading the docs would identify this problem would be the first
thing you found when you want to look for it so let's take some of these ideas
back to this more tricky debugging problem that we encountered and that is
this large nato mccreary high CPU utilization illegal state takes action
now to give you one more really piece of interesting information it happens on
Cassandra in production but not on each to and development so what should we do
right now if we're going to apply the method what we need to do right now is
look very suspiciously at that last sentence is that less intense an
observation hypothesis what is actually not really anything but it doesn't have
a subject I sent it without the subject in it so there's some sort of amorphous
it that happens on Cassandra and not on h2 so let's try to define that it a
little
get that certainly sounds suspicious so let's make the smallest possible thing
that could show us that so we'll create a test environment with the same day
that production had and then we will run a little file that we're gonna write
which is gonna be 10 or 15 lines long test on job with atomic pier and that's
just gonna come up and it's gonna perform the problem query in a loop and
if Cassandra word atomic or Cassandra + tectonic is truly broken in some way
we're gonna see that there and then we can run the same test again against
death and not see that there and now we have eliminated an entire universe of
complexity right all of the application code that was in play all the tools of
the third party dependencies all that other stuff has been mechanically
eliminated well so we did this and what did we learn the problem happens on
Cassandra that's this is our statement happens again later but not on each to
do we create the tribune repro and happens with Cassandra and guess what
its limits so it turns out that this query was just using more memory than
the JVM hadn't had no chance of winning and so that's that's pretty suspicious
because it's not at all obvious why Cassandra would be such a memory hog and
use a time or two that seems really you know their driver you know be messed up
like that well you know stop hypothesizing let's go test h2 the
definite storage and guess what the trigger Reaper happens there as well and
that leads me to the other important observation that we get from house which
is everybody lies right it turns out that it was not the case that the
Cassandra and h2 behavior were different it turns out that there was a
miscommunication about what had been tried and so by the way never attribute
to malice what you can contribute to accident on this communication this is
not lying in the sense of covering up her drug use which is always when it is
on house rate this is
this is this is lying as in saying a statement which turns out not to be true
I'm having trouble teaching my kids not only called outline any statements turns
out not to be true later turned out to be slightly mistaking you lied like
actually we tend to use that word employment so this is kind of like this
is the you know go back and double check statements and again going to the
exercise of reproducing it
imagine how much fun we would have had if instead of doing that we tell across
a conference table and atomic team at pointed at the Cassandra Cassandra
disappointed at any time of death and said you know you guys I messed up
somewhere cause this works in other contexts but we didn't even have a
problem don't we did that now the final piece of advice I give you a software
developers is that the body is fundamentally a search operation we
should ponder that a little bit because this is something we actually know how
to do right we actually know how to write algorithms it do it so instead of
limiting are thinking about debugging to the ability to see things or stop things
why don't we write programs that do this job that I just described
implement the scientific method in code make something that automatically
generate hypotheses and then automatically runs an experiment and
some of you may have done that in a small way using get by set by sex is
actually partial automation of the scientific method right you write a
little program it's gonna test for something and then get will bounce back
and forth cutting off the cutting the world in half every time until it finds
the boundary with your test changes that's automating the experiment part as
end and we have you know we have a set of possible worlds states already given
to us from get well this idea is well suited to be taken further in so many
gave you your first maybe closer con call to action this year which is going
read this book Why programs fail by Andreea seller and in particular there's
a lot of ideas in this book that have now become standard practice of their
chapters you can skip but in particular read Chapters five to seven chapters 11
through 14 which actually layout algorithms for doing automatic debugging
of programs and so what they're doing is modeling the entire state space of a
broken program and the entire state space of similar programs that
exhibit the failure and then shrinking the difference in those states basis do
we know how to do shrinking in the closer role we're pretty good at that
right we have test out check and test their algorithms for this so I would
love to see maybe a closure Western media classic lines next year
somebody giving a talk about taking the ideas in this book and realizing them
enclosure mostly ideas in the book are demonstrated with Python code and I
don't think we should not let them have all the fun right
this is a problem which is well suited for closure as you might imagine one of
the places where things get complicated as I can just glibly say the entire
state space of succeeding application and the entire state space of a feeling
that we can but how you actually capture that well that's a hard problem but it's
easier in a language it doesn't have very much to write so to the extent that
it's possible at all right in a polyamide Python program it ought to be
a lot easier to do in the closure programme you know by the way having a
language that you know models code as data and pat is everything around it we
are uniquely in a good position to implement these kinds of algorithms so
that's the Sun that's the software specifics work at a high level of
Attraction remember the fault is not the defect remember that Dec did it read the
manual twice don't trust people I reproduce and let's go out and automate
some things and you know I've been told the six bullets is too many to remember
it in a long day so make that even simpler and say listen back up and talk
about science a little bit really easy know where you're going right
remember you if you have if you're heading into particular direction have a
much better chance of getting there and then make well-founded choices this is
where developers both beginner and expert make the most frequent mistakes
they're under pressure and something's not working and you say and we've all
done it and you say you know what I'm gonna try this stop and ask yourself why
should I try that stopped and asked if she could have to talk to another person
you can say pushing through I had this idea about what's going wrong and why my
keynote presentation didn't do what it's supposed to do during the middle of talk
maybe you and I can sit down after Adam talk you through me give you a
hypothesis the effort of doing that and writing it down
that's the final thing once you've decided you're gonna take any kind of
action justified that in writing first these those two steps making good
choices and writing steps down
are gonna turn a haphazard random walk around the problem into a directed
focused cruise to an easy solution and an early night relaxing at the bar with
your friends
thank you very much

No comments:

Post a Comment