Jan 10 2010

Competing on the Basis of Speed

For those not familiar with the ideas behind Lean software (and even for those who are!), please check out Competing on the Basis of Speed, a talk given to Google by Mary Poppendieck in 2006.

One of my work goals for 2010 is to compete on the basis of speed. Specifically, I want to help my team:

  • identify tech debt, defects, or process problems that are increasing lead time
  • minimize or eliminate the creation of new defects and tech debt
  • convince stakeholders 0f the value of delivering fast to get customer feedback

I think I’m already off to a good start. The aim of the project I’m currently working on is to add functionality to our product that makes it easier to integrate into customers’ existing infrastructure. The technical requirements are known to us, and we’ve finished implementing the functionality. However, we’re at a point where we’ve hit a bit of a wall – no one on the team has any experience as a consumer of this functionality (that is to say, none of us are IT administrators), and it has been difficult to get focused feedback from others within our organization who have such experience. What it comes down to is that we’ve got a feature that is technically correct (follows RFC specifications, has been load tested, etc.) but has not been tuned to customer environments.

This is not an uncommon situation in software – in fact, it’s part of the reason why “Have an embedded customer representative on the team” is a practice in Extreme Programming. However, it’s not always possible to find internal “customers” (we call them Product Managers) with extensive experience with every particular area of the field. For larger projects it may make sense to train the Product Manager in the specifics of the new functionality by having them consult heavily with paying customers, but for smaller projects this is not always feasible. My current project is less than a month old and we’re code complete, and much of the time was over the Christmas holidays with our Product Manager (and most of our customers) on vacation, so consultation wasn’t much of an option.

So, here we stand – a mostly finished project that can be release-ready within 2 weeks but that has not been fine-tuned to meet all customer requirements (as such requirements are unknown). What to do? The traditional approach within the company has been to do a beta, but these don’t necessarily solve all problems. Our betas are opt-in, and often contain very few members (less than .5% of our customer base). Feedback can be hard to gather as beta systems are often put into non-production situations. There is also quite a bit overhead involved in coordinating and communicating with all of the customers involved.

Instead, I’m pushing towards getting this thing released to all customers as early as possible. The more people playing with it the better. The code is not buggy (we hope!), it just may be lacking some specific features or compatibility. Rather than wait around for 2 or 3 months as we do research and try to completely accurately model customer scenarios (a process that’s inevitably difficult and fraught with errors), we’ll get the code out into the field.

The best case scenario is that everything we’ve done so far is adequate for the market, and there are no future requirements. This means we’ve starting recognizing value 2-3 months earlier than if we had waited and done more market research, and we haven’t sat around gold plating the project for a quarter. The most likely scenario is that our code is adequate for some, but others will need some enhancements before it will work for them. We can then prioritize these enhancements based on some sort of financial metric (renewal date of customers who need the feature, likelihood that the enhancement will bring new customers, etc.) and deliver them over the next little while.

The worst case scenario is by far the least likely to happen – that would be where customers get the new feature, find that it’s not quite up to snuff, and because of this decide to overhaul their IT infrastructure and rip out all of our company’s products because of this. That’s so unlikely that it’s barely worth mentioning. Something like this is more likely in the case where we’re changing an existing feature instead of adding a new one, but even then it’s a slim slim chance, and would be the result of a decision on the customer’s part based on emotion rather than reason.

If you presented a customer with the choice between the following two options:

  • a rudimentary version of Feature X now, with improvements to come soon afterward
  • a “complete” version of Feature X several months from now

… I’m willing to bet that most customers would pick the first option. The choice would be even easier once the customer realized that the “complete” version from the second option would likely have to be followed up by a release or two afterward containing improvements that the developers / Product Management failed to identify in the first go-round.

I’m excited that there seems to some buy-in to this approach so far – hopefully it pays off for us!


Nov 14 2009

Limited WIP – Project Portfolio

All of this Kanban reading I’ve been doing has been great, and as I’ve mentioned before I’ve started implementing some of the techniques and metric tracking. However, after meeting with my manager (himself an experienced Agile thinker), I’ve realized that in some ways I’m trying to solve problems I don’t have, while neglecting some of my bigger issues.

Limited WIP is of course one of the key techniques / philosophies of Kanban (and lean in general). It’s not important in and of itself, but instead because out of it comes increased collaboration, reduced cycle time, identification of bottlenecks, and all of that other great stuff. We haven’t really reaped any benefit from this (yet) besides limiting the backlog, which has fostered cross-business-unit collaboration, but it’s only been 3 weeks or so and we were a fairly disciplined team before that. Some of the metrics I’ve been tracking (such as how long bugs have been in the system before we ship them, and how long items are blocked for) will definitely come in handy to measure the team’s effectiveness and customer responsiveness.

That said, we’ve never really had much of a problem with Work in Progress, at least at the task level. After talking with my manager, we realized that we had another problem, one that was discussed by Johanna Rothman at the recent Agile Vancouver conference – far too many items in our project portfolio! We have 8 people on the team, and had limited our active WIP to 12 (to allow for some blockages), but it turns out that we were working on 7 different, mostly independent projects.  Most of these projects were rather small, and the diffusion of resources was mostly due to the fact that there was no obvious parallelization for most of the projects, but still – that’s almost 1 independent project per person!

Due to the nature of my team (maintaining 3 products, and integrating these products once monthly with software from elsewhere in the company), it’s unlikely that we’ll ever get down to a project WIP limit of one. After we clean up the current mess, we’re going to try 1+1 – a maximum of one project on the go at any given time, with the exception of these small monthly integrations. With the slack time, we can clean up some of our (ample) technical debt. Part of the metrics I’ve been tracking is the percentage of our work spent on “failure load”, a combination of technical debt and missed requirements.

This will likely start in earnest in the new year, but I’m already excited about the results. I’m actually surprised that I let it get this bad – we’d been effectively doing a 1+1 project WIP limit for most of the past year and a half – but I think by formalizing the process a bit more (not making it heavier though!) we can keep ourselves to good habits.


Nov 12 2009

Introducing Kanban (part 3)

(This post is a continuation. Please read part 1 and part 2 first.)

A year or so had passed since I first started implementing the tracking system I’ve been discussing. However, in August I started work on a larger project that had to go through the company’s formal project tracking system. The project was took about 3 months (we’re just finishing it off), and we have done no releases since – that’s an eternity for a team like mine that’s used to shipping production code very two weeks. With the project coming to a close, I was looking forward to moving back to a process that encouraged more continuous flow.

I had first thought about Kanban after hearing a talk at the Agile Vancouver mini-conference, Lean Development for Lean Times. I’d also recently finished reading David Anderson’s excellent book Agile Management for Software Engineering, which emphasized the importance of short cycle times and delivering value to the customer quickly. After corresponding with David about this book, I mentioned Kanban to him and was delighted to find that he was working on book on the subject. I was able to obtain a draft of this book to review, and the ideas presented within further cemented my ambition to try Kanban with the team.

Another thing worth noting is that my team had grown, from 5 engineers under me to 8. We’d also taken on maintenance duties for yet another product, bringing the total up to 4. This increase in responsibility caused me to spend more time thinking about our development process, since more was at stake.

Limiting Work In Progress

I’ve mentioned before that the only real difference between what I was doing before and Kanban is the presence of formal Work In Progress (WIP limits). What makes WIP limits so important? As it turns out, this minor change can have major effect on the effectiveness and responsiveness of the team.

Little’s Law, a finding of queuing theory, shows that the cycle time of an item in a system is directly proportional to the number of items in the system. Thus, to reduce cycle time, it is necessary to reduce WIP. This is intuitively obvious – I can finish reading a book faster if I read it all the way through than if I read some, then read the paper, then read another book, then come back to the first book – but by limiting WIP both stakeholders and developers are made explicitly aware of the implications of trying to do too much at once.

The benefits of limiting the size of the backlog have already been discussed in part, but by having a hard cap on the size of the backlog, stakeholders can clearly see that by choosing a certain work item other work items are not being addressed. This leads to increased collaboration between different business units, and forces everyone to “see the whole” (to use lean terminology) – individuals no longer argue for local optimizations (i.e. pushing their agenda through to the detriment of others), but instead come to an agreement about what would be best for the business as a whole.

Limiting the size of the backlog also helps stabilize lead time. Expediting items (“drop what you’re doing and work on this!”) is not conducive to flow, since it almost always involves context switching. Being able to provide stakeholders with the average cycle time of items is of great benefit – it can help make scheduling decisions, decide on relative priorities, and give visibility to other departments.

Limiting WIP in the “active” states (analysis, design, testing, customer acceptance) can help identify and deal with bottlenecks. For instance, if test is a bottleneck, which will be visible when test starts to run up against its WIP limit, we can use techniques from the Five Focusing Steps to alleviate the problem, such as ensuring that test is never idle by having a buffer of work always ready for them. Since the rate of flow through your system is limited by the bottleneck, improving the performance of the bottleneck will directly lead to increased throughput and reduced average cycle time.

I don’t yet have any numbers or metrics for my team showing improvement – I only implemented Kanban a week ago. As soon as I have a large enough set of data draw some conclusions from, I will post my findings.

Any questions so far? I’m understanding all of this, but I haven’t had a chance to explain it to anyone, and I always find that explaining / teaching really helps me solidify my knowledge.


Oct 31 2009

Introducing Kanban (part 1)

Last week at work I transitioned my team to a Kanban system (warning, pdf). I’ve been thinking of doing this for a while, and in fact a lot of the process that I had already put in place was very similar to Kanban, but a couple of recent developments made the time seem right to take the plunge and fully implement the system.

First of all, what is Kanban? Kanban is a term that comes from lean manufacturing that means “visual card”.  The basic idea of Kanban is to have a system that facilitates flow and pull-based scheduling by limited Work In Progress (WIP).  In software engineering, this manifests as a visual process control system such as a whiteboard / cork board with story cards on it, or an electronic version of such a board. This board shows what state these cards are in, such as “backlog”, “in development”, “in test”, “customer acceptance”, and “waiting to ship”. So far, this sounds like pretty basic fare for an agile team.

The real key with Kanban is the limited WIP. All states of the board should have a WIP limit. For active states such as development or test, this intuitively makes sense, as it’s generally agreed that multitasking is not optimal. For a state such as “waiting to ship”, limiting the work in progress ensures that code is always shipped to customers in a timely manner. For a state such as “backlog”, limiting WIP (as opposed to considering the backlog to be the entire collection of defects, suggestions, and new features for the product) makes it possible to stabilize average cycle time (the average time it takes for an item to move from “backlog” to “complete”) and greatly eases prioritization.

As I said before, this is very similar to a system that I had set up for the team over the past year or so.  I would informally limit work in progress (by gathering a list of relatively high priority items in the backlog, encouraging team members to finish off in-progress stories before starting new ones, and shipping frequently). I encouraged a “pull” system instead of a “command and control” style approach, where team members would have a clear idea of relative prioritization and thus be able to choose the next task to work on when they’ve finished their own. I had a card wall that showed the various states that items were in.

What lead me to create a system that was similar to Kanban before I had heard about Kanban? I’ll talk about that in my next post in the series.


Sep 15 2009

Performance Loss due to Project Switching

This post is a slightly edited version of an email I sent today in response to the question “Do you know any publications on performance loss due to switching from project to project?” I thought I gave a good answer, and I’ve wanted to write more about software engineering recently, so here it is.

It really depends on what is meant by “performance loss” and what the nature of the “switching” is.

The worst case scenario (while still remaining plausible) would be switching back and forth between 2 (or more!) projects in some sort of phased SDLC – that is, doing all of the analysis for project 1, then project 2, followed by all of the design for project 1, then 2, then coding & 2 and testing 1 & 2 and so on until you’ve shipped both projects. This may seem absurd to some people, and would definitely be a major process failure for an independent team, but in some cases this might be necessary; say, if specialist resources like designers and testers are a constraint, or if there are external dependencies. The obvious problem here is the greatly increased lead time for both of the projects. If the projects are of roughly the same size, the lead time has at the very least doubled (and that’s discounting any cost at all for context switching). This is a “Bad Thing” (there should be ample references in the lean literature about the benefit of decreased lead time), even if the scenario is not as blatantly bad as the one I just laid out.

For “performance loss”, it really depends on what’s being measured. If you’re thinking about lead time (which is directly related to throughput1, and thus revenue – in other words, you should always be thinking about lead time), switching (or expediting, a different manifestation of switching) will negatively affect performance. Tom Demarco argues in Slack that the real penalties for task / project switching are often hidden because of how people measure efficiency (number of hours worked / busy people as opposed to value delivered to customers, i.e. throughput).

As for actual loss due to context switching, which was possibly the original intent of the question, I think this may have been partially covered by Brooks in The Mythical Man Month, and according to a blog post by Jeff Atwood of Coding Horror this topic is discussed in Gerald Weinberg’s Quality Software Managemnt: Systems Thinking. I haven’t read this book, but the title is evocative of Peter Senge’s The Fifth Discipline: The Art and Practice of Learning Organization (which helped pioneer systems thinking). I haven’t had a chance to read that either (it’s on my to-do list!), but it’s pretty well regarded from what I can tell.

So, does anyone out there have some other references to exactly how switching from project to project can cause trouble? I’m looking for papers, books, studies and the like, not just restating of lean or agile principles, so you can’t just say “context switching is bad because is slows people down” – show me the study!

1Agile Management for Software Engineering, David J. Anderson [2003]


May 6 2009

GMake it Happen: Build Improvements and Parallelization

Finally, a post on something that’s potentially interesting to software folk! After attending Eric Ries‘ talk on the Lean Startup, I started thinking about how to work towards continuous deployment within Sophos. Note that I say “work towards” and not “achieve” – for my product lines, at least, achieving continuous deployment would involve a very large and fundamental re-architecture, so that’s not in my plans at the minute. However, I believe that in working towards continuous deployment it is possibly to obtain some very real benefits, so I decided to take some first steps.

We’ve already made great improvements since the early days of our projects. One such change was the componentization of our builds. Rather than have to rebuild absolutely everything whenever anything in the entire product changes (leading to statements like “I changed the wording in the help file, so now it’s going to take two hours to rebuild the operating system”), we’ve broken things out into logical components. In the Sophos Email Appliance, for example, these components include:

* os (our custom hardened version of FreeBSD)
* sophox (core system tools that are separate from the os)
* pmx (the appliance version of PureMessage, the core mail filtering software)
* apps (third-party things such as the database, MTA, CPAN modules, etc.)
* ui (all code related to the browser-based GUI of the product)

Almost everything has to rebuild if you change the OS, but hardly anything builds if you only change the UI. This didn’t do anything for our worst-case build time, of course, but it’s certainly cut down our average build time quite a bit. Starting with version 5.5, PureMessage for Unix has adopted this componentized build system as well (much to the team’s relief).

To repeatedly get software out quickly, you need to be confident your code base and automated tests. If every change requires a week-long manual test pass, you’re never going to be releasing every 3 days – the numbers just don’t add up. So, we want a lot of automated tests, but this can get to be slow as well (our current nightly regression suite takes about 10 hours, mostly due to having to set up and tear down browsers). Unit testing things can help mitigate this, as often unit tests are much speedier than full end-to-end UI tests due to their limited scope, and we have several thousand unit tests in our product. But now, since we run unit tests (with coverage!) on every build, it can take up to several hours to build the entire system. So, are we stuck with a time vs. quality trade off?

As it turns out, there’s still a lot of room for improvement. In the last week, we’ve made a few relatively easy changes that have cut the build time on several key components by over 50%. There was no new hardware purchased, and nothing was rewritten. What we did was take advantage our existing hardware and the lack of dependence in most our packages by adding some parallelism to the build. This started out with me toying around with the ‘-j’ option for Test::Harness, which runs all unit tests in parallel. This mostly worked, except for a few straggling .t files that didn’t use process-specific temporary directories. Over the weekend, a fellow development manager took these ideas, fixed up all the tests, and actually got parallel tests in the build – a big win! However, he wasn’t done. After realizing that a lot of the build time is spent running “make” in all of our sub-directories (as every “make test” first runs “make”), he changed our builds to take advantage of yet another ‘-j’ option, this time for gmake. Once this was in the build, all packages were built simultaneously, and all tests within a package were run simultaneously. This really makes our build boxes work – all 4 cores are now being used – but it cuts build times in half.

Continuous deployment is still miles away, but we’ve halved the amount of time that it takes to get a change validated in a real build. It’s a start!


Apr 22 2009

Lean Development for Lean Times

Yesterday I was able to attend a mini-conference entitled “Lean Development for Lean Times”, put on by Agile Vancouver. The conference consisted of 3 speakers in two rooms, each speaking twice over the course of three time slots, allowing everyone to see every speaker (or one speaker twice, if they so desired). As you know, my company has been investigating lean development practices, and my manager was one of the conference organizers, so there ended up being about 10 people from Sophos there. Thankfully, there was a mix of software developers, managers, product management, and even the VP of development, which I believe increases our chances of adopting lean.

The first speaker I took in was Silicon Valley veteran Eric Ries, who gave a rapid-fire talk on the Lean Startup. The talk was quite informative, my favourite of the three, even though I have no desire to start or join a startup – the ideas and methods he gave are applicable to startup-like teams within large organizations as well. I took extensive notes, and I’ll likely end up doing a separate post just on this talk (unless my writing duplicates Eric’s website, which I have yet to read in-depth).

The second talk was given by Corey Ladas entitled “Scrumban: Lean Thinking for Agile Process Evolution”. Scrumban is a portmanteau of Scrum, an implementation of the agile software methodology, and kanban, a japanese word translated roughly as “sign board”. Scrumban is the idea of taking kanban cards, an idea borrowed directly from lean manufacturing, and integrating them into the standard agile task board as a way of implementing pull scheduling. This is an idea that I’m likely going to try out with my team, so I’ll talk more about the concept once I’ve had a bit of experience with it.

The last talk of the day was “An Introduction to Lean Product Development” by Katherine Radeka. True to its name, it was an introductory talk on lean development principles, how lean development differs from lean manufacturing, and the nature of waste. Having read a few books on lean recently, there was nothing too groundbreaking here for me, although I did enjoy the Q&A and participatory aspects of the talk.

I quite enjoyed the conference, and it was interesting to see the reactions of people to some of the claims being put forth by the presenters (as well as from others in the audience). Hopefully I’ll get to try out some of these techniques at work, and I’ll definitely post about any successes or failures that result.


Apr 18 2009

First Steps With Lean Software Development

The latest buzzword right now in software development (as well as many other industries) is “Lean”. My company has recently started adopting some lean principles, and a lot of my reading in the past few weeks is either directly or indirectly related to lean, so I thought it would be good to write a bit about what I’ve learned so far.

First of all, a little history to know what we’re dealing with. Lean software development is primarily an application of lean manufacturing principles to the software industry. In the manufacturing world, lean is starting to be widely accepted as a Really Good Way to Do Things. Toyota was the first company to really embrace lean (in fact, they helped develop it), and although they’ve been hit by this recent economic storm, they’re doing worlds better than the Detroit Three, thanks in no small part to lean principles.

I’ve mentioned “lean principles” a few times, so I should at least list them. The seven lean principles are:

Eliminate Waste
Build Quality In
Create Knowledge
Defer Commitment
Deliver Fast
Respect People
Optimize the Whole

I plan on doing a post on each of these, with a short definition followed by how I’m trying to adhere to these principles with my team. I don’t claim to be an expert (or even proficient) in lean development, but I find that writing helps me solidify thoughts, and I’d love to hear any questions anyone has about lean. Stay tuned for a discussion on the first lean principle: Eliminate Waste. In the meantime, if you’re interested in learning more, check out the Wikipedia page on Lean software development or pick up a few lean books.


Mar 21 2009

Managing a Software Team: Prelude

Scott suggested that I write about my transition from being a software developer to managing a team of software developers. In this entry, I’ll talk about my history at Sophos and what led me to apply for a managerial position.

I started at Sophos in 2004, just a few days after writing the last exam of my degree. After a referral from Luke (thanks, Luke!), I was hired as an intern to develop software in Perl on Linux. Needless to say I was quite excited, but also a little worried – I didn’t know any Perl, and I only knew the basics of Linux / Unix. I was hired because I “showed promise” (thanks, Cliff!), and I immediately set myself upon the path of Learning Perl.

After spending 8 months developing internal test tools, I was recruited onto a product team. The goal of this team was to take PureMessage, the company’s flagship anti-spam product, and turn it into an appliance form factor. Most of the grunt work had already been done by some senior developers, but I was brought in as part of a three-person team tasked with creating an award-winning UI for the product.

The next few years were a bit of a blur, work-wise: we got a new VP Engineering, who introduced our office to Extreme Programming. The Sophos Email Appliance launched and has done well, the team grew and thrived, and I found myself a bit more drawn to the organizational / process aspects of the company. I was still writing unit tests, writing product code, running test passes, and all the other things a developer should do, but I was also acting a bit like an XP coach (sort of like a party whip for XP practices).

Around two years ago, I began taking more of an interest in long-term planning as it relates to software. Neil, my manager at the time, gave me the responsibility of doing the tracking for our project – finding out how many hours were spent on certain tasks, helping track project progress, and attending release planning and project status meetings. During my review that spring, I briefly mentioned that at some point in my career I could see myself getting into management, but I didn’t expect anything to come of it in the near future. The next week, I met with our HR department and Neil and learned that there was an internal management training course being offered later that summer. I was excited about the quick turnaround time, but also a bit nervous – I had mentioned that I could see myself in managment at some point, but I didn’t necessarily want it to be this soon. The course came and went, and I found it very interesting – I’ve never taken any psychology or cognitive science, so it was interesting to see techniques from these fields being applied to the software realm.

There was an opening for a managerial position on the maintenance team soon after that, but I decided not to apply for it. I’m not really sure why I didn’t go for it – probably a mix of lack of self-confidence and comfort in my existing position. I was one of the more senior people on the team by that time (in terms of length of time working on the project), and it was nice to be able to provide answers when people asked. The management spot went to a person who was already on that team, and I pretty much stopped thinking about advancement for the short term.

Fast forward to about 7 months ago. The manager who took the role I mentioned above decided that management was not for him, so he was stepping down. That left an opening for a manager, and with Neil’s encouragement I went for it. I was especially interested in the position since it was going to be for a brand new team – the old maintenance team, with only 3 people (including the manager) was being upgraded to a 6 person team and being given a broader mandate. There was a transitionary period where I worked with the existing manager to get up to speed, since I was now going to be spending part of my time with a product that I didn’t know much about, and there was a bit of time when I was “unofficially” in charge (due to delays in the promotion process), but in mid-November I officially became the manager of the Email Gateway FastTrack team.

In the next entry, I’ll talk about what my team does and relate my first impressions of management.


Jan 14 2009

Leader no more

Well, that’s that – I’m no longer the leader of Van.pm, the Vancouver perl users group.

A few years back, I was interested in hooking up with some local Perl people for networking and learning opportunties. I had attended one meeting of Van.pm before its then-leader moved to Seattle, and there hadn’t been a meeting since. I found the mailing list and asked about a meetup. The answer was something along the lines of “there hasn’t been a meeting in years because no one has organized one; if you want one, plan one”. It sounded like a fun thing to get involved with, so I went for it.

Fast forward to now. I think there’s been a grand total of 2 or 3 technical meetings, and 1 or 2 social meetings. I’ve been thinking about why this is the case. At first, I believed it was because I had trouble getting ahold of speakers, which is true – I had sent solicitations out to the list without response, and I just didn’t really know many people in the community at large. Over time, I began to realize that I just wasn’t that passionate about Perl. This is why I didn’t schedule more meetings – I didn’t want a meeting without scheduled speakers, where people just sat around and talked about Perl.

Don’t get me wrong – Perl is still my language of choice, the language I work most with professionally, and the language I’ve worked with the longest. If I had to choose a new language for a project, I would likely choose Perl. That said, I just don’t find myself getting excited about it. I don’t contribute to CPAN, I haven’t tried out Perl 6, and I don’t go on the Perl IRC channels. If there is an interesting advancement in Perl 6, or a particularly nice module on CPAN, I’ll have a look and read about it, but I’ll almost never test it out. I like consuming content about Perl, but not necessarily talking about it.

So, over the Christmas break, my friend Scott suggested that he take over leadership. Scott is pretty much the opposite of myself when it comes to Perl – he’s got a few modules going, he’s always on one Perl IRC channel or another, he’s contributed tests to Perl 6, and he’s always one to try out new perlish things. Based on this, I thought it would be a great idea that Scott take over. He’s already put forward to have more regular meetings, scheduled speakers or not, and I believe this will cause more people to come out. He’s a much better fit for the role than I am.

Oh well, it was fun while it lasted. Good luck, Scott :)