5 lessons to take away from TSB bank’s operational resilience failings

Operational Resilience is the act of ensuring a business is able to successfully function in the face of market adversity, unprecedented events, or technological malfunction. It requires firms to assess their tolerance for risks and ensure they have procedures in place to prevent operational disruptions.

On 20^th December 2022, just before many of us were getting ready to take our Christmas break, the Financial Conduct Authority (FCA) and the Prudential Regulation Authority (PRA) fined TSB Bank plc a total of £48,650,00 for operational risk management and governance failings. This included the management of outsourcing risks in relation to the bank’s IT upgrade programme.

In this blog, we will explore the key takeaways from TSB Bank’s failings and five ways how firms can mitigate operational risks.

What went wrong with TSB Bank’s IT migration?

In April 2018, TSB Bank updated its IT systems and migrated the data for its corporate and customer services onto a new IT platform – the “Migration Programme.” Its core objective was to transfer customer data from the legacy IT platform TSB inherited in 2013, which was a part of the Lloyds Banking Group to a new single, modern platform.

This was a complex project. The planning and testing spanned over four years and the TSB Board understood that moving the entire bank from multiple, third-party legacy systems to a single new platform was a hefty but essential task.

TSB Board decided to migrate and did not imply that customers would be affected in the process. They considered expert scrutiny, attestations and assurances from Executives and third-party suppliers as well as feedback from third parties and regulatory bodies.

The migration was carried out in stages. At first, moving systems such as ATMs, debit and credit card payments, mortgages, and the digital app. The key internal systems were moved and finally, all customer data was migrated.

There were significant levels of disruption after the main migration of customer data.

Even though the data was migrated successfully, there were technical failures in TSB Bank’s IT system. This led to customers being unable to access banking services such as branch, telephone, online and mobile banking.

Mark Steward, FCA Executive Director of Enforcement and Market Oversight said: “The failings, in this case, were widespread and serious which had a real impact on the day-to-day lives of a significant proportion of TSB’s customers, including those who were vulnerable. The firm failed to plan for the IT migration properly, the governance of the project was insufficiently robust, and the firm failed to take reasonable care to organise and control its affairs responsibly and effectively, with adequate risk management systems.”

What caused the migration disruptions?

The two data centres which were in place to support the “Migration Programme,” were configured inconsistently even though they were specified to be the same.

There were also issues around coding and capacity.

Technical issues arose from a high volume of customer enquiries as the public became more concerned. Customers tried to use TSB’s physical and telephone banking to access their accounts but the resource available was unable to withstand these demands.

Despite testing, the configuration issue was not detected before the migration.

In summary. TSB did not sufficiently consider the risks of such a large-scale migration or how these risks would be overcome, in particular, the operational risks. There were gaps in TSB Bank’s knowledge of the “Migration Programme” and its complexity.

It affected all of TSB’s branches and a large number of its 5.2 million customers. Some continued to be affected by the event as TSB took until December 2018 to return to normality. TSB paid £32.7 million to customers who were affected and suffered.

The FCA and the PRA found that TSB “failed to organise and control the IT migration programme adequately, and it failed to manage the operational risks arising from its IT outsourcing arrangements with its critical third-party supplier.”

TSB Bank’s failings demonstrate how operational risks can disrupt a firm and affect its customers. Thus, it is important for firms to invest in their operational resilience.

Sam Woods, Deputy Governor for Prudential Regulation and Chief Executive Officer of the PRA, said: “The PRA expects firms to manage their operational resilience as well as their financial resilience. The disruption to continuity of service experienced by TSB during its IT migration fell below the standard we expect banks to meet.”

5 lessons to take away from TSB Bank’s operational resilience failings

What can your firm learn from TSB Bank’s IT failure? Here are five ways to improve how to manage major technical change.

Ensure an appropriate governance plan is in place

Any one individual can make a mistake. In addition, a team that has been working on a technical change project can become blind to certain types of risk. A change management board can provide a certain level of governance, but again they can discount risks that have a low chance of occurring but a high impact.

For large significant technical changes, it is imperative that the correct governance is in place. The FCA in the UK now insists that the company’s important business services have been identified and that they operate within defined impact tolerances. These plans need to be designed, reviewed, and accepted at the board level.

Adding board level, compliance and internal audit scrutiny should help spot risks and provide the necessary critical thinking to prevent high-impact failures.

Take an agile approach

One of the primary reasons for the TSB Bank IT failure was the sheer scale of the migration that was attempted in a relatively brief time window. Large-scale changes of this type should be discouraged and avoided where at all possible. Taking a more agile approach where the change is broken down into smaller pieces and delivered incrementally significantly reduces the risk, helps planning and minimizes the impact if something goes wrong.

Despite the best planning and testing, large IT changes can result in side effects and knock-on impacts. Some scenarios can be very difficult to reproduce in a test environment and may only materialize in production – particularly those related to the volume of activity or very rare use cases. Incrementally introducing change allows operational teams to closely monitor activity and it can be easier to back out of changes if problems do occur.

Embed risk management into every function

Risk management should not exist in a silo, confined to a single team or function within an organization. Operational, regulatory, and reputational risks should be identified and understood within every team, particularly IT and change management groups. Understanding the principles of operational resilience and identifying in advance important business services and impact tolerances will equip change teams with the tools to identify unacceptable risks.

The FCA is keen for firms to understand the interconnectedness of systems, processes and services and consider what happens if multiple components failed at the same time. Furthermore, the regulator is keen for firms to look at operational resilience through the eyes of the customer. What is the customer impact of a system failure? How is the customer kept informed? What alternative action should be taken?

These questions will help firms provide suitable alternatives and communication methods should a problem occur.

Test operational resilience

Business continuity and operational resilience should not simply be documented plans that stay filed until the moment they are needed. They should be living documents and processes that are tested regularly and updated from test results and feedback.

Testing scenarios will improve the firm’s confidence that outages can be managed and whether alternative ways for customers to perform key activities will work in practice.

Evidence should also be requested and collected to demonstrate that any key third-party companies or systems are equally operationally resilient.

Monitor regulatory requirements and best practices

There have been multiple regulatory responses to the TSB Bank failure and similar major IT-driven outages. Firms should ensure that they fully understand regulatory obligations and have a timetable in place to effectively meet them. The regulatory requirement is the minimum requirement and firms may choose to exceed them in areas of particular importance.

Obligations should be understood by everyone within the organization. Integrating the regulatory change program with GRC systems can help propagate new and updated obligations to the rest of the business. This will ensure that key policies, processes, tools, and documents are flagged for a review or update if the underlying regulation has changed.

Horizon scanning can also help firms understand evolving risks, trends, and incidents that firms can learn from and incorporate into their own operational resilience programs.

Keep ahead of emerging operational resilience regulations by speaking to CUBE.

Speak to CUBE