Software Procurement Disasters
An essay on the ongoing debacle of public sector software procurement[1], or perhaps more generally; Failure and How To Deal Wih It
There is virtually no aspect of life today that is not touched in some way by software. The vast majority of interactions between a citizen and their government are now mediated or supported in some way by software. The rise of the Internet and its increasing ubiquity in all aspects of life has created a vast increase in the amount of software that has to be purchased, developed and deployed by various parts of the government.
This has not been a universally positive experience for those tasked with procuring such software, or those in charge of their budgets and indeed those tasked with supplying the software itself. Very frequently software procurement, even when it is of supposedly minor modifications to existing systems, goes wildly over budget, is a management nightmare of recriminations and bad feeling, is late (sometimes monumentally so) and occasionally fails to complete at all. This can be very, very expensive. The NHS had to write off at least £9.8 billion pounds of public funds on the failed Connecting For Health software project in 2013. That is not a trivial amount of public funds to have been wasted and apart from the unusually large financial cost attached to this particular example is far from an unusual occurrence. It has become distressingly routine to hear of over-budget, late or completely failed public software projects.
When these projects go wrong there is always a process of trying to understand why things have gone wrong. This often has quite negative effects for the people involved in the process, both on the procurement side and the fulfilment side. This is true in both public and private sector software development but is particularly acute in public sector projects because the notion that the public must be accounted to for financial waste is deeply engrained in our collective political culture, to a far greater extent than accountability to shareholders is manifested in the private sector. This often leads to a painful process of trying to find who is to blame so that they can be appropriately chastised and the public might see that some action had been taken and that there is therefore some hope such a debacle might not happen again. For those caught up in the middle of such a process it is often deeply stressful, humiliating and perhaps worse, devoid of clarity and truth.
If one reads a few reports of failed software projects what seems surprising is that, for an engineer, there is no deep unifying theme. We could reasonably expect that the failure modes of a class of projects such as writing big pieces of software would have some common principle, one that could be elucidated and then improved upon. This turns out not to be the case. To be sure, there have been many claims over the years to have found such a unifying principle and to have invented the way to overcome it. Endless different project development methodologies and project management disciplines exist as a result of this erroneous belief that it is possible to somehow, with either technical or human process engineering, manage to overcome whatever defect it is that causes a £9.8Bn software project to just stall. Every single one of the methodologies fails in the end, despite huge amounts of human effort and vast amounts of money being spent on them. While they are being adopted they are full of promise and hope, the great new fashionable silver bullet that will make it all work. But after we have used them for a while we nearly always come to the sad conclusion that they just do not work. Their adherents, like religious acolytes, claim the failures are because they have not been followed correctly. As we shall see, this is not true, it is simply the case that they do not do what they purport to do. They usually do something and that something is often useful, humans usually do better with structured processes when they want to construct any engineered product, but they do not guarantee your software project will come in on time and budget, or even that it will complete at all.
To anyone involved in software procurement in any role or to any degree none of this is news. We all know how difficult it is to deliver software on time or to a budget. There are very few software developers with several years experience who have not participated in a project disaster of some kind and the same is true for procurement and project management professionals. What is surprising is that very few people understand why the disasters happen. The reason they do not understand why disasters happen is because the reason is counter intuitive and when understood properly actually rather upsetting. Proper understanding of what underlies our collective ongoing failures implies a wholesale change in the way we procure software is needed. It is a change that runs counter to many ideas that are currently prevalent in public sector procurement and in truth the changes needed run counter to some basic human ideas about how we should spend money. This makes for a very uncomfortable set of ideas, ideas that are a hard sell to the public and as a result of this an ever harder sell to elected politicians. However, if we are to improve on the current track record of public software procurement then an effort must be made to change.
So why do so many software projects fail, either in part by just being late and over budget or so completely they are abandoned entirely?
The usual suspects always involve human error and incompetence, human idleness, greed or politics. The requirements were poorly defined. Or wrong. The vendor chosen was not good enough. Or too greedy, they charged too much for changes. Or the lead developer was idle. The project manager was an idiot. The software team leader was run over by a bus. They caught rabies. The schedule was ludicrously optimistic (which in a failed project is of course axiomatic), the management team incompetent, the legal contract flawed. The reasons go on and very plausible they usually are. There might even be some truth in them from time to time (just often enough for them to remain credible, unfortunately), but none of them actually state the underlying problem.
The truth is stranger and more challenging to contemplate.
There is a field of theoretical study known as Algorithmic Complexity which was developed from the 1960's onwards by Kolmogorov, Chaitin and Solomonoff amongst others. 1997 and 2001 papers by J. P. Lewis[2] took some of the ideas from this field and applied them, quite straightforwardly, to the issues surrounding software development and came to some startling and unwelcome conclusions. I should stress that these conclusions are not qualitative, they are rooted in fundamental mathematics and logic. They represent conclusions about the nature of the universe, not about our abilities as a species, not about politics or what we would like to be true. They are conclusions about the structure of reality itself, they are not opinions which are subject to a political debate or the winds of fashion. As such we don't get to argue with them, all we can do is figure out how to live with them. This, as you will see, is quite uncomfortable. The conclusions in J. P. Lewis's paper, Formally Unsolvable Problems in Software Engineering can be summarised as follows, and I quote from his paper, which I recommend you read.
1 - ‘Program size and complexity cannot be feasibly estimated a priori’
You cannot figure out how big or complicated your program will need to be to meet your requirements. All you can do is guess. At best your estimate might (emphasis on might) be nearby. This is not a trivial conclusion, it states that it is actually impossible to accurately determine how big your task is. Theoretically impossible. As in, it cannot be done, no matter how smart or experienced you are. I’ll repeat that, because it took me a while to take it in myself. You cannot make an accurate prediction of how big or complicated the program you’ll need for your requirements must be. Anyone that tells you they can accurately make such a prediction is delusional, it is theoretically impossible in the same way that perpetual motion is impossible.
2 - ‘Development time cannot be objectively predicted’
There is no way to know how long it will take to implement your requirements. This follows from the previous conclusion. If you don’t know how big or complicated your program might need to be you cannot work out how long it might take to write.
3 - ‘Absolute productivity cannot be objectively determined’
You cannot figure out how productive your programmers will be in any absolute sense. You cannot know exactly how many developers you will really need and there is no theoretical way of going reliably from requirements to man hours of effort.
4 - ‘Program correctness cannot be objectively determined’
You will never know if your program is correct, which means you can never know if it really works properly and has no bugs. This is provably unknowable.
These are startling conclusions for those of us who do not live in a rarified world of theoretical computer science and these results have some deeply unsettling implications for any of us who are asked to commission a piece of software. They are not as absolute as they seem at first glance. If they were then there would be very little software ever written, and the world is now awash with software. It is plainly possible to make estimates of software development time that ‘most of the time’ turn out to be reasonably correct, to some degree or other. It is possible to show that a program is in some sense ‘good enough’ even if it is not provably correct. However, the devil for all of us is in those caveats, the ‘good enough’ and ‘most of the time’. It is not the times when we are fortunate that really count, it is those outliers that sink £9.8Bn projects that we need to understand how to manage, and our current approach to software procurement actually does the reverse, it makes encountering one of those more likely, not less likely. Worse than this, it means that when we do encounter such projects we fail to attribute the problems to their correct cause, we nearly always embark on a mistaken process of looking for people or institutions to blame.
In order to begin to think clearly about software procurement and how it might be improved, the first thing we must do is clearly acknowledge and internalise some facts that we now know to be true. These facts run quite counter to both how software engineering likes to see itself and how those who purchase the services of software engineers like to see those engineers and their companies. This is a perfect storm of mutual delusional thinking and to break free of it requires all parties to accept reality before figuring out how to deal with it.
The first sacred delusion that has to go is that the writing of software, the planning of software projects and as a consequence the budgeting of software projects has much in common with the kind of engineering that builds bridges. In fact the theory, as we have seen, tells us that it is not just ‘not quite like’ that kind of engineering, it is extremely unlike it. Assumptions made that procuring a software system is somehow similar to purchasing a new road, a batch of tanks or a fleet of refuse lorries are just wrong. Really wrong, the kind of wrong that is going to get everyone in trouble. The Universe just does not work that way. Estimating the likely time and cost to construct a bridge is a well honed and nearly completely reliable science these days. Estimating the likely time and cost to write a recruitment subsystem for a personnel department has far more in common with predicting the weather than it has with procuring that bridge. The best you can hope for is an expert forecaster (and you will never know until it is too late if they are any good) backed by clever estimation processes and as much historical data as they can find on your class of problem. And what you should get from them is a statistical probability, just like weather forecasting, not a hard prediction. Once you have this forecast you can only treat this with the same respect as we treat our current weather forecasts. It's about that reliable and anyone who implies they can do better than that is just selling snake oil.
The second sacred delusion that must be eliminated is that the people tasked with the implementation of your program have as much control over that as they think, and as a consequence you think, they do. As we have seen, the difficulty in implementing a program is fundamentally unknowable before you start. Project managers, programmers, technical architects, none of them can escape this fundamental truth. Even if you have one of the top 1% of elite programmers working on a task, and that task was estimated as taking a week they can still be working on it a year later and it will not be their fault that this is the case. Sometimes you are just going to hit one of those problems that is really hard or even impossible to solve. It happens, and when it does it is no-ones fault, it is just a consequence of point one in J. P. Lewis's paper. You have just run into a basic fact of writing software, theoretically unavoidable, the pure impossibility of reliably and accurately figuring out program size and complexity in advance.
The third delusion that we must remove is that of thinking we understand the consequences of our specifications when we try to turn them into software, particularly the effects of small requirements or changes to program complexity. Just because a requirement seems small, or a change nearly insignificant doesn't mean it will be so. In fact anyone who has ever been responsible for a large software project has encountered such minor tasks turning into giant, difficult, time consuming monsters. It is not just a theoretical issue, it happens all the time that a minor specification change suddenly implies the wholesale re-architecting of a massive piece of code to accommodate it. These exploding code instances are living exemplars of point one. Point one is not an abstract, easy to ignore fact. It happens all the time in real projects.
The final and most fatal delusion, the one that perhaps is the most damaging, is that there exists some method, some process or management magic, that can eliminate or make inconsequential the first three delusions. That somehow, if we had the right people and the latest ‘ftang ftang boondoffle mk2™’ project management system and if no-one actively screwed up and we set out with realistic expectations then all of these fundamental theoretical problems can be made to go away. They can't. There is no magic bullet. These things are going to happen to you whatever you do, sooner or later, probably sooner. All we can do is to think about how we are going to live with them and how we can do so without making all our lives miserable and learning to hate one another.
So let us briefly imagine that we live in a perfect world and everyone can come to recognise these delusions for what they are. How might we go about purchasing the next big software system so that it has the best possible chance of some kind of success? And how might we deal with the budgetary and political consequences of recognising our present delusions about software procurement? And how do we deal with it all going wrong, as it certainly will from time to time?
Fortunately these are not difficult questions to answer once one has recognised that things don't work the way we'd like. If we can see that it is both no-ones fault that this is the case and that it is also fundamentally unfixable as a problem then we can begin to reason more clearly. The keys to thinking about realistic software procurement are on the one hand trivially simple and on the other hand very hard to accept, because they fundamentally requires three things that we as human beings are very bad at and in the sphere of government particularly so. The first is the acceptance and proper management of uncertainty, the second is the ability to properly recognise when our project has failed, which requires dealing well with the fallacy of sunk costs (a fallacy that afflicts virtually every public spending debacle) and the third, our greatest potential saviour and probably the most difficult to contemplate, is to extend a high degree of trust to our fellow humans, sometimes with scant reason to do so.
Expanding on the first point, we must acknowledge that all software development is intrinsically a probabilistic process with a lot of chaotic (in the mathematical sense) behaviour. We must begin to use and apply the vocabulary and thinking that is more at home in weather forecasting to the process, and learn to live with outcomes we did not expect, timescales that we did not want and costs that we cannot afford. When we have an estimate for the shipping of some system it should be couched in terms of what percentage of the system and what probability it might be done by a certain time, and to what level of reliability. We must recognise that sometimes a beloved feature has just turned out to be too difficult or costly to implement and we must be prepared to abandon it despite significant sunk costs in our attempt to get it working and sometimes significant ongoing business costs implied by that failure. We must recognise that ending up with 80% of our specified system is actually an OK outcome and we must design our business processes around this probable outcome before we embark on the project. If you were planning on firing your telephone helpdesk because your new all singing all dancing website would do it all, think again. It will probably never do it all, so plan on that outcome because that is what is likely going to happen, whatever the grand plan says.
This is the best that can be done. I'll repeat that, because people have trouble believing it: This is the best that can be done, in a fundamental sense, and anyone who tells you otherwise is just selling you magic beans. If your procurement process cannot accommodate probabilities, in time, specification and budget, then it is, simply put, broken. It will never be successful at procuring software. And by ‘never’ I mean it will endlessly go wrong, and you will never understand why. It will cause untold stress to you, to your providers to your staff and eventually to your customers and clients, whoever they are. And if your business processes cannot withstand running on say 80% of your requirements in software (and not necessarily the 80% you'd really like either) then redesign your business processes to deal with it, do not assume that by sheer good fortune you will be one of the tiny number of organisations that gets 100% of their requirements implemented on time and in budget.
The second issue, dealing with failure, is tough. The theory tells us plainly that sometimes our projects will fail because we have not estimated the size, complexity and cost of our program anywhere near to the reality. There is no escaping this outcome from time to time, it has nothing to do with the quality of your staff, the good intentions of all or the need to complete the project. It is just how it is, you have to acknowledge that the estimation process is at best a good guess and that is the best anyone can do. So the questions are, how do you recognise that your project is failing and what do you do when you see this?
If you do not already know about the Fallacy Of Sunk Costs I strongly recommend that you read up on it. It is a pernicious and eventually very expensive failure in logic that all humans are deeply prone to, and it is your enemy in dealing with this issue. Setting a clear numerical way to decide on the failure of a task is not so hard, they key is to set it before you start. You can have a date, or a cost, or a combination of both. It's not very hard to do this when you plan your project. When you meet it, you must just abandon the task, unfinished, with no further discussion. There is no other way but when you get to that point, believe me, you will not want to do it. But you must, because you are simply falling into the fallacy otherwise and your life is about to get much worse as a result. This is particularly an issue in public spending, where the concept of ‘wasted public funds’ carries with it a strong opprobrium. This is something that only our communications professionals can help us with, because in order to save a lot of money in the future we must recognise that failure will happen and it will happen frequently in software development simply because that is the way the Universe works. Cleanly writing off chunks of work, funds and effort without blame or stigma must become just part of the process and we somehow have to normalise that so that we can begin to deal with it properly.
And now I come to the last issue, which is to me the potential cure that starts to make all of this uncertainty tractable. This technique is personal to me simply because I used it to great effect on a number of fast moving, often complicated and purely commercial software projects in the late 90's. It was an incredibly hard sell to our clients and I now recognise that the only reason we managed it was because we were in huge demand, often turning away unattractive work. The clients accepted it because we insisted that was the way we worked, not because they were happy about it in their procurement processes, which often had to be mangled to accommodate us. It was however hugely successful and enabled us to build a successful software business in just seven years from scratch (with no capital) that we sold in 2000 for a substantial sum, enabling me to retire at the age of 35. It works. It works very well.
The basic underlying principle we used in that company was simple. It was ‘trust us’ This sounds weird and it is a very hard principle to sell, as you might expect. For a procurer it is very hard to accept, because the first question that must be asked is ‘why should I?’
There are two answers to that. The first is, you really have no choice. As we have seen, the fundamental theory now tells us that any promises I make to you about performance are going to just be fantasy, effectively lies, however nicely they are dressed up. I might even believe them, you might believe them, but that is just our mutual delusion setting in. If we write those performance promises down in a contract, for example saying how much we will charge you for writing some piece of specified functionality, or how long it will take, all we are doing is making the fantasy the subject of the law, thereby almost guaranteeing ill feeling and some time with all our solicitors. That is simply a definition of stupidity. So if any performance promise I make to you about delivering working programs can only be fantasy, what choice do you have? You cannot buy a working program from me, because that concept is not mine to sell. Efforts to write contracts like that are doomed to failure and they are not doomed by human wishes, they are doomed by Algorithmic Complexity Theory. So what can you buy from a software supplier? Pretty much what you can buy is time, expertise and good intentions. You have to trust them.
The second answer to ‘why’ is both simple and difficult. It is reputation, it is the ‘good intentions’ and ‘good personnel’ part of the ‘trust us’ equation. By the late 90's our company had a lot of this, we had a great track record which helped us sell the ‘trust us’ approach. But this is a difficult question in public software procurement. For starters, there are not many companies that can attempt to deliver large infrastructure software projects. Every one of the existing candidates has a list of failures and disasters to their name that does not look great for their reputation. But we now know that failure and disaster are inevitable from time to time, so the outcome of the project should no longer have such great effect on reputation. To be sure, a large percentage of failures must damage a reputation but there are going to be some programs that fail to get written however good the company that attempts them is. What is needed here is something more subtle, a review and monitoring process for software vendors that encapsulates the percentage of success (in the same probabilistic manner we must learn to manage projects with) with a measure of public review of how much effort the software supplier puts in. If you are supplied a giant team of dunderhead programmers (it happens) it is reasonable to have a way to express your dissatisfaction so that the next public project does not favour that vendor. Systems to enable this kind of thing are pretty standard these days in the commercial world, Trustpilot, Ebay and Amazon reviews generally do fine job of enabling quick assessment of online vendors in many areas. It is time the government had this for it's IT suppliers, but before we subject the software supply world to such a thing it is important to take onboard the rest of this essay. It is no good badly reviewing a software supplier because they made an impossible commitment to you and failed to deliver it, that is as much your fault as theirs.
In conclusion I'd like to offer up some simple rules for procuring software. They are not infallible because infallible turns out to be impossible, but it enables failure to be accommodated with less stress, less damage to reputations, less damage to careers and less public expense and waste. There are Government rules for procurement. It is time they were updated to accommodate our current theoretical understanding of the impossibility of predicting the cost of software development.
John's Software Procurement Guidelines
- Failure will happen. Get over it and plan for it from day one in engineering terms, business terms and in communication with all stakeholders.
- Because failure is often an outcome, keep your ambitions down. Plan a £10Bn software project in one lump you risk a very real £10Bn failure. Don't do that, there is never a need for something that monolithic, break it into many smaller bits, each useful in itself that can fail or succeed by themselves.
- Failure is usually not someones fault. It's in the nature of software. Cease the blame game, it has a very high chance of being plain wrong
- Plan your project and its interactions with your business processes to accommodate significant specification failure, say 20-30%.
- Plan your project using probabilities. Don’t like the answer you get when you add them all up (you are encouraged to get a statistician to help you here, its easy to mess this up) properly? Don't start, you are not yet ready for failure.
- Have an acceptable development cost/time for each feature. When it exceeds this kill it without hesitation, or it might kill your whole project. Deal with the result of this in your business processes elsewhere.
- Plan your project to start modestly with a barest, barest minimum of features. The smaller you can make this the better (and you still need to accommodate 20% failure in this). Plan to deploy with that first. This minimises the chances of large scale failure.
- There is now a whole software development methodology based on this, a family of them in fact, but ‘ early and often’ one of he best ways to recognise things that are failing and kill them off before they kill your project. Use this approach if it is remotely feasible, it is no silver bullet but it really helps.
- Do not pick a vendor based on how cheaply or quickly they tell you they can do it. That is a promise that they cannot keep, it is theoretically impossible, provably a lie. Pick a vendor based on the quality and quantity of the personnel they promise to keep on your project (you can contract this, its something they can actually keep to) and how much they will charge you for that quality. Examine their reputation carefully but use caution as current expectations of performance in procurement are largely fantasy and reputations are often damaged when these unachievable expectations are not met.
- If your vendor cannot manage the project using probabilities or has a bad reputation when they do this (too much optimism is amazingly common, too much pessimism very rare), don’t hire them.
- Once you have chosen them, trust them to do their best. If they don't the project is doomed whatever you do, so you may as well. Never moan at them about something taking too long, it just makes you look stupid, remember that no-one can know how long something is going to take, it is provably unknowable. Keep reminding yourself of this. If something takes too long just kill it, don' hesitate.
- You and your software vendor are a team trying to solve a problem. Bits or all of that problem may be unsolvable, at least in your timescale and budget and perhaps ever. Try to find that out as you go. If you allow an adversarial relationship to develop between you and the vendor, you' doomed, it is the worse kind of outcome and massively raises the chance of total failure as they just won' bother to tell you when they think something is undoable, or simply proving too difficult. They'll just keep persuading you to spend more money, which is usually not a good idea.
- Your whole project might completely fail. Plan for this, in expectation management of and communications with stakeholders. Allow for the possibility in your budgets. It can just happen through no fault of anyone, it is the nature of software development. If it does, learn what you can, kill it and move on. Think of the number of commercial startup failures every week in IT. Many of them fail simply because the software they tried to develop was not achievable in the time and budget they had. You' just had one of those in the public sector, its not the end of the world.
And finally;
When you have decided to do the project, estimated your cost, added on plenty of contingency, appointed a contractor and are ready to go, stop and turn your thinking around. When you define the system, you have a wish list that you'd like to buy. It turns out to be a very bad idea to imagine that this is what you are actually buying when you kick off your project, because some of that might turn out to be impossible or just too expensive. So take a moment and realign your thinking. Now what you have is a pile of money, a development team and a list of wishes. No more. Start at one end and see how many wishes that pile of money you have can buy you. When you have run out of money, just stop. If it's not enough to have a functional system, go home, it's failed. Really, give up. That is the best you can do, and if you think of the project like that then you might actually end up with a far better result than trying to have everything from day one. You will certainly live longer and be happier, not to mention that the rest of us are less likely to read about you in Private Eye.
John Lambert, August 2018
1 – I wrote this essay for a friend of mine who worked in a senior communications role in the civil service for many years. Her department had commisioned a large and complicated piece of software that was vital to the smooth running of a new service and the project was not going well, it was late, it had problems and everyone was very annoyed. This was my effort to help explain why it wasn't going well.
2 – Large Limits To Software Estimation – J. P. Lewis 2001.
|