On December 30, 2023, the RevolutionEHR Team carried out a planned software upgrade of our primary database system called "MySQL." This upgrade was crucial in our ongoing commitment to providing a secure, high-performance, and industry-leading solution to meet your practice management needs. To prepare for this significant upgrade, our team conducted comprehensive testing of the new database and its configuration within our internal non-production environments and initiated the complete QA process.
On January 3, 2024, between 08:00 AM and approximately 10:50 AM CST, a misconfiguration in the upgraded database caused the server to exceed the maximum number of connections it could maintain. As a result, RevolutionEHR components started to reboot frequently leading to degraded performance followed by a complete outage.
To investigate the incident, the team temporarily turned off customer logins to the application. By examining the database logs and configuration files, the team identified a misconfigured parameter related to MySQL's thread pool, which was the root cause of the uncontrolled growth in database connections. Once corrected, this parameter enabled the MySQL thread pool, which stabilized database connections and decreased the load on the database and the application. Once the environment was confirmed to be stable, customer logins were reenabled, and access to the application was restored.
Despite extensive testing before the upgrade, this issue was not identified because it could only be triggered by significant load. To prevent incidents like this in the future, our team will enhance our ability to simulate customer activity in lower environments in order to more closely replicate production-level customer activity. With a closer simulation of production customer activity, we could have more readily identified the misconfigured parameter before it impacted our customers.