The semicircle (episode 8 -- The Fifth Floor)

le 19/01/2018 par Christophe Thibaut
Tags: Software Engineering

The treatment of error as a source of valuable information is precisely what distinguishes the feedback (error-controlled) system from its less capable predecessors. Jerry Weinberg

_@OlegTxl Direct Message

Hi Oleg, can you spare me an hour of your time?

ok, around 6pm? ping me (...)

ping

pong

Thanks Oleg. I would like some advice about mob programming for my project at work

OK

I was given three months to turn the situation around on the app I'm working on

situation ==?

we have to deliver a major release in 3 months, but we're having a lot of problems related to quality. My mgr wants me to improve quality.

what kind of problems?

a bit of everything, but mostly design and regressions, but not everything is a bug, I think users are trying slip new stories in while we're not looking.

how many people are working on this app?

4. My mgr is also Product Owner. Do you think we need to hire another dev?

if you want to spend the next 3 months training a new recruit, do that.

Well, yeah, I'm with you on that, it's futile. I want to propose mob prog, but what if it doesn't work?

what options are you allowed to try?

it's up to me to define - within the bounds of reasonableness. What I would like is to find a way to have fewer bugs

can we talk to each other live? Or by tel.?

I'll call you

_

"What's your definition of 'bug' exactly?" "Good question! Some unexpected task that just slips into your work..." "That's not clear..." "Let's say we have incidents in production, linked to errors. Is that less vague?" "So an 'incident' is a problem that could be about the application or the user or the system, or possibly the documentation?" "What you mean is the lack of documentation. But yeah, that's pretty much it." "And what is an error?" "An error is when there is a defect in the program, or the data. But sometimes we are sent defects that are not really defects."

"And what difference does that make?" "I don't understand your question." "Okay, no worries. Can you tell incidents from defects in your follow-up analysis?" "No we have everything in the same tool. We only create incidents. Why do you ask?" "So I can understand what you do with the information." "What information?" "The information embedded in the incidents and defects." "I don't follow you.? "Incidents and defects are a source of information about your product and your process, right?" "I agree, but exactly what are you proposing we do with the incident info?" "I would say there are 3 things you gotta do with the incidents. First: manage the incident and find the defect that is 'likely' the cause of the incident (duh)." "Of course." "Second: create a system to better detect defects, for example, tests." "For example." "And third: create a system to catch the defects, which really means to prevent them. Because the best defect management strategy is to not create them in the first place." "Good on paper, maybe." "I know." "Are you saying that mob programming is a defect prevention practice?" "Why not? We could think of it that way. Why would we get in groups to produce code that we could program on our own, if it wasn't to find defects more quickly?" "Can you help me do this now?" "Dude, I don't have time!" "Too bad." "Yes." "Ok, what would be your best idea?" "I already gave you my best idea." "Huh?" "1: handle the incidents, 2: detect the defects, 3: prevent the defects." "Hey but that's a bit abstract for guidance." "Where are you with these three activities at work?" "I guess we do know how to manage the incidents. Tests: we could improve a lot. As for prevention we have no strategy. There's work to be done."

"I suggest we do a little role-play. In this role-play, imagine that you are a very young developer just out of school." "OK." "You've just been hired into a company that seems to be working well. You have joined a team of three, and you have been entrusted with the maintenance of an existing program that does complicated calculations for one of the business areas." "OK. Sweet."

"You are on the first floor. Floor 1." "Floor 1? Okay." "Sylvia, a user of the said program just had an ugly surprise: she gets an incorrect result on one of the calculations: 70,000 instead of 140,000. A doubling error. What are going you do about it?" "I make a copy of the data and I try to reproduce the defect in my environment."

"Bravo. You are now on floor 2." "That was easy." "You found the origin of the problem: it's a typo in a variable name. The author of the code had written 'prefit' instead of "profit". As it turns out, the program is written in Awk, a language in which variables don't need to be declared. When Awk finds a variable it doesn't know, it creates it and initializes it to zero. It's convenient because it allows you to write very concise programs. But it's also a problem, because in the case of a typo, your program continues with a zero value without reporting any error." "I see." "What are going you do about it?" "I will rewrite this in a better language!" "You don't have weeks. We must act immediately. What are going you do about it?" "I am going to correct the problem, and deliver a new version. It was just a typo."

"Good. You are now on floor 3." "Perfect." "Perfect, not so much: Sylvia comes to you to ask you if there is a way to avoid these miscalculations in the future, because it makes her look bad." "Ouch." "What are going you do about it?" "I will do some testing of the app before each new delivery?"

"Bravo. Here you are on the fourth floor! Now you are running systematic tests on your program before delivering it. You're still far from testing it thoroughly, and anyway you know it's not possible, but you are discovering some interesting things: - One of the results was not displayed right, the standard format for displaying thousands was not used. - Another miscalculation, related to another typo: you confused two variables (those names were quite close, mind you). - A logic error: in a somewhat special case, the algorithm doesn't complete the calculation, because it goes into an infinite loop. - And finally, by running your program on a very large file, you saw that the execution time goes from 4.32 seconds for 100 lines, to 1 hour 20 minutes for 10,000 lines." "Wow." "What are going you do about it?" "Uh. I'll look for another job?" "Really?" "I will correct all these problems, and redeploy." "Ok good. You're still on floor 4. Sylvia reported these problems to Harold, her manager. This is normal, since Harold's results depend on your program, and she's accountable for that issue." "Argh..." "So you're invited (or rather summoned) by Harold. Harold asks you: so far we are quite satisfied, but in the future would there be a way to respond a bit faster to our requests? A week of acceptance testing for a program a few lines long is hard to swallow." "Hmmm." "What are you going do about it?" "I'll say to him: Walk a mile in my worn shoes!" "Seriously, what are going you do about it?" "I'll ask my teammates if they can help me by reviewing the code with me."

"Very good. Now you are on floor 5. You're organizing systematic reviews of your program. It takes a little time, but there are almost no defects in production. Plus there are some collateral benefits: - Your team now knows enough about the code to help you change it; as it is a complex domain, this helps you a lot; - you have a coding standard, which is improving with each review; - out of a total of 5 reviews you have already found: - 2 other typos in variable names (yep); - 3 rather subtle logical errors; - a dozen improvements to the code formatting; - a new way to automatically generate test data."

"Life is beautiful!" "Yes. Harold summons you again." "Oh?" "He says to you: 'Now that everything works like clockwork, we wondered if you could free up some of your time for another small program, but it'll have to be quick, well executed, with a process a little more lightweight than your usual process, ok?'" "Errr, uh..." "What are going you do about it?" "I'll tell him no." "You can't really say no to Harold. What are going you do about it?" "I'll try to show him that it's better to follow the new process." "Very good. How are you going to do that?" "I imagine that it'll be enough to compare the results I had on the 1st floor with those I get on the the 5th floor." "What results? What interests Harold is the numbers." "I would say, for each floor, I show: - the number of defects found in production; - the time spent to prevent defects; - the time spent correcting defects; And after comparing the floors, he decides to let me apply my process rather than his own."

"Right. And you stay on the fifth instead of coming back to the first floor." "I see." "The real question is: how fast do you want to go from the first floor to the fifth floor?" "Exactly, is it even feasible in my situation? In three months? I have my doubts." "You asked me for my best idea; that's my best idea." "I want to invite you for a beer as a way to thank you." "Thanks, maybe some other day, I don't really have time." "By the way: what's happening on the 6th floor?" "On the 6th floor, you become manager." "And what does a manager do?" "You help your teams to climb up, floor by floor."

(to be continued) Previous episodes : 1 -- If the code could speak 2 -- See/Advance 3 -- Communication Breakdown 4 -- Driver/Navigator 5 -- Brown Bag Lunch 6 -- Takeaway Tips 7 -- Crisis/Opportunity