BA meltdown: crisis researcher caught in the chaos reports on a massive airline failure

Has someone tried switching it off and on again? Denis Fischbacher-Smith

For an academic who has spent more than 30 years researching organisational crises it was something of an odd experience to be in the middle of the British Airways IT foul-up on Saturday May 27. And it provided a textbook example of how organisational systems need backup and effective communications if chaos is to be avoided when they fail.

I started my journey in the morning in Copenhagen, with no reported issues from the airline, and the first mention of a problem came close to the end of the flight as the pilot announced that we were being delayed on the way into Heathrow. This was explained as being a result of thunderstorms taking place in the south east of England.

After more than 20 minutes of circling over the North Sea, we were told the delays were a result of congestion due to the storm. On landing, however, passengers were told that there had been a lightning strike which had resulted in a catastrophic failure of the communications system at Heathrow. The pilot said he could not contact BA ground staff to find out which gate to head for.

So, the aircraft sat on the taxiway until contact could be made. Some 40 minutes passed as the pilots seemed to try all means possible – including mobile phones and email – to establish contact. The pilot then announced that we had been given a gate, but that there would be a further delay as it wasn’t possible to communicate with the ground staff to ensure that there would be buses available to move passengers. This was the first indication that passengers on the plane had that the problem wasn’t a temporary loss of communications.

Having cleared customs and moved to the BA lounge to wait for my connecting flight to Glasgow, it was clear that Terminal 5 was in a state of chaos. Few BA flights had left as scheduled and the lounges and open access areas were teeming with passengers. BA staff in the lounges were not able to provide any further information and rumours were flying around among passengers that rather than a storm, the IT system may have been brought down by a ransomware attack – much like the one which had caused huge problems for the UK’s National Health Service earlier in May.

The company has blamed a “power supply issue”, but in truth, we are still waiting for the full picture to emerge.

Technical flaws

Either way, if these issues were the root causes of the company’s IT outage then it would have implied that the company’s surge or virus protection processes were somewhat inadequate to deal with the problem and that there was no effective backup. The speculation swept through the lounge, and the lack of communication to passengers was starting to show the company in a bad light.

At about 2.30pm rumours spread that media outlets were stating that no BA planes were going to be leaving Heathrow and Gatwick before six o'clock that evening: cue a flood of passengers to the lounge desks trying to find out what was happening. BA staff were adamant that this wasn’t an official BA statement and a public announcement was made in the lounge that it was a false news report. However, this narrative of events was not to last. Just 20 minutes later, it was announced that all BA flights before 6pm were cancelled. It looked like reporters were getting the correct information before customers or even staff; British Airways internal communications were looking shaky. At this point some passengers, who had been starved of information, resorted to asking the lounge catering staff for information. They didn’t work for BA, but they were the only people in the lounge at that point who weren’t passengers.

Staff absent

Eventually, passengers were told to go to collect their baggage and exit the airport. Again, an absence of staff left confused passengers with no way to seek clarification. Again, the catering staff seemed to be the ones that passengers approached.

Almost inevitably there were large queues for baggage collection and there was still a lack of clear information from the small number of BA staff present. The most insightful information came from two pilots who were also stranded, and who were working tirelessly to help passengers. But where were other staff? There also did not appear to be any managers in attendance to deal with the ongoing issues.

After waiting in a queue for about 30 minutes, passengers were herded into the domestic baggage claim area where, despite previously being told that their bags would be available, they were not. Instead, passengers had to search out a form, fill it in, and get in line once again. The person giving out the forms was at the front of the queue, leaving those at the back of the queue in the dark. Some passengers were given different forms to complete as the company seem to have run out of the appropriate forms at the desk.

Going nowhere. Daniel Mennerich/Flickr, CC BY-NC-ND

After waiting in line for another half hour, a member of BA staff stated that passengers could fill in the form on-line and that they didn’t have to wait in the queue. Of course, the web site was down at this point. When asked for clarity, the rather confused looking members of staff were forced to say that they had no additional information. Passengers were seemingly left to their own devices to make their way home – there was no advice from the staff other than to keep receipts for any costs that were incurred. In my case, this involved getting the Heathrow Express into London Paddington, the underground to Kings Cross, the train back to Glasgow Central, and then a further train - an eight-hour journey that saw me arrive home at midnight.

These were some of the issues that I directly witnessed airside. Media reports have confirmed a very similar experience for customers in the booking hall. And, of course, other passengers were stranded at airports across the world and faced many of the problems we had to cope with at Heathrow. Many passengers left Terminal 5 with no bags, no material support from the company, and no idea when their bags might be returned. It wasn’t long before some passengers were heard to rename the company Bags Anywhere.

By any measure, this incident has been a public relations fiasco for British Airways. The company’s apparent fall from grace – from that period where it could bullishly advertise itself as “the world’s favourite airline” to its performance on Saturday – is stark.

The managerial behaviour shown by BA on the day was typical of organisations in crises: the lack of effective communication; the growth in conflicting information and rumour; an absence of any apparent contingency plan on the ground; and a sense of confusion among staff. These are all elements that are often displayed in the early stages of a crisis event. The fact that social media and other web sites were the main source of information for passengers simply highlighted the poor practices of communication on the ground. It was certainly not clear from a passenger’s perspective who had taken ownership of this event. There was no obvious central source of information and passengers had to queue for what limited information was available.

Embedded errors

This was not, however, the first IT failure that the company had experienced. Media were quick to point to previous problems, hardening the perception that BA was a company in crisis. This particular failure was not an isolated event. It served as a graphic representation of the potential for disruption that seems to have been embedded within the strategic decisions taken around the design, testing, and implementation of the new system, the development and testing of contingency plans, and the provision of an effective communications policy for use in such events. Put another way, the costs of errors were embedded into the system and they became all too apparent as the system failed.

The events of May 27 illustrated the problems that can occur in those socio-technical systems that are optimised for a just-in-time form of delivery. When the system operates in a degraded mode, or fails catastrophically, then it is often too complicated for staff to revert to a more manual basis of operations because they are so reliant on the technical elements of the system and have often not been trained in an alternative way of working.

What lessons does the BA incident offer other organisations? Firstly, if an operating system or process is central to performance and reputation, then the organisation needs to ensure that it has considered how it will function if it fails catastrophically. Managerial assumptions about the nature of risk are important factors in shaping their willingness to consider the worst case scenario and to prepare effective contingency plans. Lightning strikes or computer hackers are not unknown, or even particularly surprising, dangers.

Not the first time BA has fumbled. Jorge Quinteros, CC BY-NC-ND

Secondly, it highlights the importance of having an effective communication strategy in which staff at the sharp end of the operation are kept fully informed as to what the company is doing to contain the crisis. Organisations should not be providing information to the media while failing to inform local staff about the situation – or worst still, providing alternative accounts of the problem. There is also a need to ensure that there is sufficient information provided to customers who are directly affected by the event.

Thirdly, there need to be additional resources provided to deal with the demands of a crisis. Not only do staff need the right tools to respond effectively, but they themselves need to be robustly trained to cope when it comes to the crunch. Customers and service users need to be told the correct information and provided with the appropriate documents that are needed to process claims.  

Learn from near misses

Finally, BA’s woes point to the importance of organisational learning from early warnings and near-miss events. This is a challenge for all organisations as there is often a sense of denial that such catastrophic failures can happen on home turf. After all, staff at every level of an organisation often believe that theirs is a tight ship, well-managed and reliant on well-designed technical systems. It is this process of denial that invariably prevents managers from reflecting on their own capabilities under crisis and prevents them from questioning what they would do in similar circumstances.

The aviation industry has a well-established process for collecting information on near-miss events in relation to the performance of pilots and aircraft. It is so effective that it has been seen as the gold standard for other sectors, such as healthcare. Given the history of problems with BA’s IT systems, one might be forgiven for thinking that it has not been as diligent when considering its own managerial early warning processes for core business processes. Organisations need to overcome the barriers to learning that can damage the ability to cope with a crisis. A failure to do so can be hugely damaging, as BA is now discovering.