When Everything Goes Wrong: Leading Through a Critical System Recovery
As a software engineering manager, I’ve faced my share of technical challenges, but nothing quite prepared me for the week when our core loyalty points administration system went completely dark. The Perfect Storm LPA The system that orchestrates this entire operation, let’s call it our loyalty platform, had lost connectivity to essential services. We were looking at a potential customer impact that could affect our entire quarterly cycle. The technical details were complex: our replica server in the cloud couldn’t communicate with the master server on-premises, authentication services were down, and network connectivity had been severed due to recent security changes. On top of all of this we are working with a strained organization just from hardening exercises. ...