S2 - Google Calendar Sync
Incident Report for Teem
Postmortem

Teem by Eptura detailed Root Cause Analysis | April 11, 2024 

S2 Google Calendar Service not Synchronizing 

 

We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident. 

 

Description: 

Customers using the Google Calendar service experienced events that were not automatically synced. During this time, a workaround was provided to force a manual sync, updating the calendars.  

 

Type of Event: 

Functionality Issue 

 

Services/Modules impacted: 

Production/ Google Calendar Service 

 

Timeline (Reported MST):  

On the late afternoon of April 11th, 2024, at approximately 3:50pm, multiple customers reported an issue with their Google Calendar Service not automatically syncing calendar events. Customers were provided a temporary workaround to manually force sync their calendars. All customers were made aware of the Severity 2 incident via Teem Status Page. The investigation continued through April 19, 2024, when the CloudOps team identified the root cause of the issue. On April 22, 2024, at approximately 11:08am, all customers were notified via Status Page that the fix had been implemented and we moved into a monitoring phase. After continuous monitoring, no additional reports for Google Calendar Events and customers confirming that their Calendar events were syncing automatically, the Severity 2 incident was marked as resolved on April 29, 2024, at 10:23am. 

 

Total Duration of Event: 

17 days, 18 hours, 33 minutes 

 

Root Cause:  

We observed that the PgBouncer and PgBouncer_ro services will not run simultaneously on job managers. Due to the startup script, it is unclear which of the two services is running, and it seems that the "last to start wins" scenario occurs. In an instance restart, a different service could "win" and cause further inconsistency. We have also discovered that three of our Job Managers are running outdated code. 

 

Remediation: 

These services shared a unix socket directory. By providing different unix socket directories, the services both would run simultaneously and eliminate the inconsistency. This eliminated significant errors on the jobmanagers 

 

Preventative Action:  

Our team is dedicated to continuously improving the Google Calendar Service by enhancing our current processes and implementing robust monitoring systems. We appreciate your patience and cooperation during this disruption.

Posted May 20, 2024 - 11:32 MDT

Resolved
We deeply appreciate your patience as our team worked diligently to resolve the recent calendar event synchronization timing issue. We are pleased to inform you that we have successfully identified the root cause of the problem and have implemented a fix to resolve it.

Our team is committed to ensuring normal calendar performance and will continue to monitor this issue closely to ensure the best possible customer experience. For any further questions and concerns, please reach out to our dedicated support team. Thank you.
Posted Apr 29, 2024 - 10:23 MDT
Monitoring
We appreciate your patience as our team is working diligently to resolve the issue with unpredictable timing of calendar event synchronization. We have identified an issue that was causing these symptoms, and have developed and implemented a fix to resolve the issue.

Our team is dedicated to ensuring normal calendar sync performance, and we will continue to monitor the issue to ensure full resolution and provide additional status updates if needed. Thank you.
Posted Apr 22, 2024 - 11:08 MDT
Identified
We appreciate your patience as our team is working diligently to resolve the issue with unpredictable timing of calendar event synchronization. We have identified a potential issue which could be causing these symptoms and are actively working to address it.

Currently, we have observed that the PgBouncer and PgBouncer_ro services will not run simultaneously on job managers. Due to the startup script, it is unclear which of the two services is running, and it seems that the "last to start wins" scenario occurs. In the event of an instance restart, a different service could potentially "win" and cause further inconsistency.

To resolve this, we have worked on a solution where these services now have separate unix socket directories. By providing different unix socket directories to both services, they can run simultaneously and eliminate the inconsistency. This eliminated significant errors on jobmanagers.

Our team is dedicated to restoring normal calendar sync performance, and we will keep you updated with additional status updates as we continue to monitor and make progress towards a resolution.
Posted Apr 19, 2024 - 11:52 MDT
Update
Our team is currently working to resolve an issue that is impacting sync times for customers using Google calendars. We want to assure you that our team is fully committed to resolving this issue as swiftly as possible.

We recognize the importance of timely event syncing, and we apologize for any delays you may be experiencing. Restoring normal calendar sync performance is our top priority, and we will keep you updated with additional status updates as we make progress towards a resolution.

Thank you.
Posted Apr 18, 2024 - 14:34 MDT
Update
We are continuing to investigate this issue on priority. We apologize for the delay, next update will be shared at 3:30 PM CST.
Posted Apr 18, 2024 - 10:34 MDT
Investigating
As the fix implemented haven't resolved the issue completely, we have moved to the investigation phase. Our Engineering team is currently investigating the issue with Google Calendar Service to determine the cause of disruption. The next update will be posted at 11:30 AM CST.
Posted Apr 18, 2024 - 06:38 MDT
Update
We are continuing to monitor for any further issues for next 12 hours.
Posted Apr 17, 2024 - 06:42 MDT
Update
We are continuing to monitor for any further issues for next 12 hours.
Posted Apr 16, 2024 - 15:03 MDT
Monitoring
A fix had been identified and applied to optimize the performance of the Google Calendar Sync. We are moving into the Monitoring Phase for the next 4 hours and next update will be shared at 3 PM CST.
Posted Apr 16, 2024 - 09:56 MDT
Update
We are continuing to investigate this issue on priority. We apologize for the delay, next update will be shared at 11 AM CST.
Posted Apr 16, 2024 - 05:49 MDT
Investigating
As the fix implemented haven't resolved the issue completely, we have moved to the investigation phase. Our Engineering team is currently investigating the issue with Google Calendar Service to determine the cause of the disruption. The next update will be posted at 7 AM CST.
Posted Apr 16, 2024 - 02:32 MDT
Update
We are continuing to monitor for any further issues for next 12 hours.
Posted Apr 15, 2024 - 06:51 MDT
Update
We are continuing to monitor for any further issues for next 12 hours.
Posted Apr 14, 2024 - 13:47 MDT
Monitoring
A fix has been implemented. We are moving into the Monitoring Phase for the next 12 hours.
Posted Apr 13, 2024 - 02:12 MDT
Update
We are continuing to investigate this issue on priority. We apologize for the delay, next update will be at 3 am CST
Posted Apr 12, 2024 - 22:13 MDT
Identified
As the previous fix implemented did not resolve the issue completely. We are continuing the investigation with Google Calendar Service and have determined the cause of the disruption and are working on a fix. The next update will be posted at 8 PM CST.
Posted Apr 12, 2024 - 15:06 MDT
Update
We are continuing to investigate this issue on priority. We will post another update at 4 PM CST.
Posted Apr 12, 2024 - 10:27 MDT
Investigating
As the fix implemented haven't resolved the issue completely, we have moved to the investigation phase. Our Engineering team is currently investigating the issue with Google Calendar Service to determine the cause of the disruption. The next update will be posted at 12 PM CST.
Posted Apr 12, 2024 - 07:22 MDT
Update
We are continuing to monitor for any further issues for next 4 hours.
Posted Apr 12, 2024 - 04:27 MDT
Monitoring
A fix has been implemented. We are moving into the Monitoring Phase for the next 4 hours.
Posted Apr 11, 2024 - 23:58 MDT
Identified
The issue with Google Calendar Service. has been identified and a fix is being implemented. We will post another update at 1am CST.
Posted Apr 11, 2024 - 19:56 MDT
Update
We are continuing to investigate this issue in regard to Google Calendar Sync. The next update will be posted at 11:45pm MDT.
Posted Apr 11, 2024 - 19:35 MDT
Investigating
We are currently investigating an issue with customers using Google Calendar Service. Our Engineering team is currently investigating to determine the cause of the disruption. The next update will be posted at 7:45pm MDT.
Posted Apr 11, 2024 - 15:46 MDT
This incident affected: Google Apps Calendar.