System Outage
Incident Report for KornitX
Resolved
At approximetly 16:00 UTC our primary database server started to show extremely high CPU load and reached a maximum number of connections allowed at any one time.

We immediately stopped all scheduled tasks to try to conserve connections and CPU time for the web tier. This only resulted in a small improvement as the vast majoiry of database traffic from scheduled tasks hits our database replicas rather than the primary however it was a sensible first step.

Further investigation revealed the majority of the load was being generated from a particular SQL query within our user authentication mechanism which had started applying significant backpressure on the primary database server due to an unusually large influx of concurrent users trying to login to the system.

We subsequently prevented any new user sessions from being initialised and the CPU load quickly reduced back down to acceptable levels.

We have now deployed a more efficient version of the SQL query in question.

Initial performance testing shows that the new query is several orders of magnitude faster and we do not antipciate any further occurances of this particular problem.
Posted Jun 18, 2020 - 16:00 UTC