_This is the story of a project, neither more complex nor simpler than others: an application that communicates with a database and two other systems. Something quite mainstream from a technical and architectural side, something standard from the management side: all must be done for yesterday and there is a lot to do…In short, “it’s gonna be hard” as often say the developers but nobody screams it out too loud. So we build the team. 40 persons are staffed, people are specialized. The teams are organized in pools, so that a kind of contract is setting up between the different pools. Each pool is responsible for treating certain kind of demands. A flow of demands appears. Certain pools are under pressure and become the bottleneck: a stock of demands is created upstream whereas the downstream pools are waiting…Therefore and for these under pressure pools, important things are becoming urgent things. Choices must be made among urgent things to treat the immediate ones. Task switching is becoming the way of working and in the end, the flow slows down.
Then the deadline of the “go live” comes: it is in two months. The user acceptance tests are just starting but have been delayed by the tedious and painful integration between the different components. Maybe the built contracts between the teams have complicated the integration: some mandatory parameters are missing, the dates do not respect the proper format, the error codes are partially interpreted… In any case, the user acceptance tests detect more bugs than what the development team can resolve and all is not still tested.
So we add more manpower. A team to resolve the bugs, a team, in another place, to finish the development, a third one to integrate the different components. But all these teams share the same strain of code and the changes of some will impact the corrections made by others. In short, you know about the following (the developers have worked all night long and during week-ends) and the end (the deadline has been postponed and the initial scope of work tasks modified). And thanks to the miracle of computer science: the application was finally delivered and running!_
This was a couple of years ago. I thought this was “the way to go” and that all projects were managed the same way. Since I read and talked with a lot of people. I now see this story with a different state of mind.
Concerning this, I quote Tom De Marco’s citation:
You say to your team leads, for example,“I have a finish date in mind, and I’m not even going to share it with you. When I come in one day and tell you the project will end in one week, you have to be ready to package up and deliver what you’ve got as the final product. Your job is to go about the project incrementally, adding pieces to the whole in the order of their relative value, and doing integration and documentation and acceptance testing incrementally as you go.
In short: working to always be able to go live, tomorrow
Beyond the real organizational and technical issues, you will need to be willing to incrementally build the software. Contractual agreement responsibilities between the teams would be limited so as not to organize themselves on technologies or tasks but on business features. From the technical point of view, each team will so be responsible for the good running of the complete feature. From the management point of view, the managers and business guys will have to make choices: what is THE absolutely needed feature. From my own experience, the more you work in environments where you have to meet the deadline, the more this kind of feature teams and organization will help you.
In Slack , Tom De Marco exposes some characteristics of fear culture:
…among the characteristics of the culture of fear organisation are these: - it is not safe to say certain things (e.g. I have serious doubts that this quota can ne met)… - goals are set so aggressively that there is virtually no chance of achieving them - Power is allowed to trump common sense …
When I am thinking of fear management, I imagine a kind of despot physically impressive who shouts at his collaborators from his desktop, striking with his fist on his desk…a beautiful cartoon in brief. It seems to be a little more insidious but we have to admit there are contexts where people are under such pressure, where it is difficult to raise an alert, where deadlines are fixed without any considerations of the teams capacity to do and where, in the end, the latter are under the pressure of commitments taken by their managers regarding their own hierarchy.
Understanding the problem is certainly the first step. But how can we solve it? What can we do when a manager does not understand the risks and refuse to accept what is unavoidable: you need to choose, prioritize and negotiate the scope to keep the deadline or move it forwards? This task is far from being an easy one and the best answer I got now is the “backlog” coupled with a “burndown chart”. There are, in my opinion, several benefits in these kinds of situations: 1/ Bring together all the tasks (technical, functional tasks…). These tasks can, of course, be organized or consolidated by use-cases or features. 2/ Share all the tasks with all the project participants. To say it differently, rendering the immensity of what must be done. 3/ Show a confident and realistic deadline and thus, enabling the managers to prioritize efficiently between the tasks. 4/ Show all the added tasks that will necessarily postpone the initial deadline.
The Brooks law has been established in 1975 (I was still not born) and states:
Adding manpower to a late software project makes it later
We have all experienced it. But we have to admit we still all tend to add more manpower to meet the deadline instead of changing the initially defined scope and keep an optimal and adapted team size. Brooks explains his law with two major points. The first one concerns new team members who have to be trained thus consuming productive time of people already in place. The second one is a myth making us believe that development tasks can be segmented “as you go”, not taking into account the intellectual part of the work and the inter-personal communication needed between all developers. We can moreover add difficulties linked to the organization of the developments and the needs to share between all developers the same code. So many details that will make the teams’ productivity decrease.
Concerning this and always in Slack, Tom de Marco tells about what he calls “overstaffing” and states:
Meeting the deadline is not what this is all about. What this is about is looking like you are trying your damnedest to meet the deadline. In this age of “lean and mean” it is positively unsafe for you to run the project with a lean (optimal) staff.
An interesting point of view…
So as usual, the critics are easy and the art is difficult. We will thus notice that the same errors occur again and again whereas alternatives (which, be sure about that, will have other limitations) exist but rarely tried. “Risk Management is a discipline of planning for failure” (Slack, Tom de Marco) and this is maybe where we are not good at. “Everything fails all the time” states Werner Vogels during the last 3 minutes of this video . Yes there are advertisements for EC2 behind that statement. Yes the architect is, by nature, pessimistic. But too often we do not think about the potential failures. Tom De Marco teaches us that managing risks will first demand to identify them, to monitor them, to set indicators that alert us when the failure is upcoming. Sometimes, alternatives will have to be found. Some people will have to be trained. Parallel version of software will be developed in order to choose, at the very last moment for decision, the most adapted solution (Lean Management calls this “set based design” principle).
From the architectural point of view, this approach of risk management will imply to architecturize our systems – in strong collaboration with people from the business – to manage and embrace all these errors. In other words, to forecast in our architecture the maximum of all likely error cases: - How to manage a degraded mode in case the system turns unavailable? - What are the procedures (if needed manual procedures) to proceed to finalize a business process in case of error? - What are the mandatory informations needed to finish the business process? - What are the alarming mechanisms in order to be pro-active regarding the end-user, informing him an error occurs and helping him properly finalize his in progress work?
On an existing system, evolution will have to be done in order to make sure these already detected error cases are definitively fixed.
But let’s be honest. If you are cost-driven, you will find all the non-business requirements useless. But finally, our faith in an application isn’t it more based on its ability to manage the error (resiliency and reliability) than any other criteria?