.
A misconfiguration in Google Cloud's VMware Engine (GCVE) resulted in the accidental deletion of UniSuper's cloud infrastructure, impacting over 620,000 members of the Australian superannuation fund.
The incident, which occurred earlier this month, left UniSuper customers without access to their accounts for over a week.
Incident details
During the deployment of a GCVE Private Cloud for UniSuper in early 2023, a Google Cloud operator left a configuration parameter blank. This oversight caused the system to default to a one-year term for the cloud instance, leading to its automatic deletion at the end of that period. The incident was only detected once the deletion occurred because it did not trigger a customer notification, as it was not a result of a customer-initiated deletion request.
The impact of this incident was significant, though it was limited to one of UniSuper's multiple GCVE Private Clouds across two zones in one cloud region. No other Google Cloud services or customers were affected, and UniSuper's data backups stored in Google Cloud Storage were not impacted.
In response to the incident, UniSuper and Google Cloud worked around the clock to restore the affected services. The recovery process involved restoring network and security configurations, reinstating applications, and recovering data to ensure full operational functionality.
Google Cloud says it has taken several steps to prevent such incidents in the future. The internal tool that led to the misconfiguration has been deprecated and replaced with fully automated systems controlled by the user interface. Additionally, a thorough review of all GCVE deployments was conducted to ensure no other instances were at risk. The system behavior that allowed for automatic deletions due to blank parameters has also been corrected.
In a joint statement, UniSuper CEO Peter Chun and Google Cloud CEO Thomas Kurian apologized for the disruption. They emphasized that the incident was a unique occurrence and assured that measures have been implemented to prevent its recurrence.
This incident highlights the importance of robust risk management and redundancy in cloud services. Given the opportunity from the recent setback impacting such an established and trustworthy entity in the space like Google Cloud, customers are reminded of the critical role of backups and redundancy in managing potential outages.
Leave a Reply