Degraded performance of API and web site
Incident Report for Teem
Postmortem

There were several issues that were addressed during the emergency downtime that were all contributing factors to the performance issue:

  • Indexes on several heavily used tables were bloated/corrupted leaving the primary database instance starved for processing resources. This was corrected along with adding additional capacity and alerting.
  • Various instance and network level errors indicated potential issues with the hardware hosting the virtual database instance. The system was force migrated to new hardware and processes cleared and restarted ensuring proper function and replication. Additionally, failover processes and triggers have been reviewed and updated to help avoid single node disruption of the wider system.
  • Several background and asynchronous tasks were better tuned and balanced to avoid resource over utilization.
Posted 13 days ago. Oct 09, 2019 - 16:45 MDT

Resolved
This incident has been resolved. The database performance has continued to stay at normal levels as a result of the back-end changes.
Post mortem to be posted by Thursday, October 10th
Posted 19 days ago. Oct 03, 2019 - 09:36 MDT
Update
We have continued monitoring, and the implemented back-end changes have resulted in database performance staying at normal levels.

We will continue monitoring this evening and will provide another update by 10 AM MT/12 PM ET tomorrow.
Posted 20 days ago. Oct 02, 2019 - 13:47 MDT
Update
The backlog of requests has returned to normal levels. We are continuing to monitor and will provide updates as further information is found.
Posted 21 days ago. Oct 01, 2019 - 14:59 MDT
Update
At 8:30AM MT this morning there was a significant increase in sync requests to our systems. This backlog of requests is trending downward and it is expected that some calendars will be out of sync at this time while the systems catch up. Next Update by: 3PM MT/5PM ET
Posted 21 days ago. Oct 01, 2019 - 12:26 MDT
Update
A code release was pushed last night to address the small subset of O365/Exchange calendars, as well as improve calendar syncing overall. We are monitoring the results and will update by 4PM MT/6PM ET
Posted 21 days ago. Oct 01, 2019 - 09:38 MDT
Update
We are investigating reports of a subset of calendars not syncing for O365, Exchange, and Google calendars that appear to be related to this incident. Updates will be provided as they become available.
Posted 22 days ago. Sep 30, 2019 - 14:24 MDT
Update
All systems are operational. Teem will leave this incident open and continue to monitor systems closely throughout the weekend to verify all global clients are fully functional.
Posted 25 days ago. Sep 27, 2019 - 16:52 MDT
Monitoring
All systems are operational and Teem will continue to closely monitor the platform throughout the day. During the maintenance window overnight the team shifted hosted hardware, restarted and performed maintenance on the primary database, including updates, reindexing, general clean up and reinitialization dependent services. Teem will be adding additional capacity to the system throughout the day. In addition we will continue to monitor our systems. As of this update, all systems are currently operational.
Posted 25 days ago. Sep 27, 2019 - 09:49 MDT
Update
We are continuing to investigate this issue as services have operated with degraded performance throughout the day. While a root cause has not yet been confirmed we have identified issues relating to our database cluster and will be scheduling an emergency maintenance window later this evening to address the issue.
Posted 26 days ago. Sep 26, 2019 - 16:54 MDT
Investigating
Teem is currently investigating a wide spread performance issue across the site.
Posted 26 days ago. Sep 26, 2019 - 09:54 MDT
This incident affected: Web Interface and API.