S2 - Calendar Events using Google & O365 are not syncing
Incident Report for Teem
Postmortem

Teem by Eptura detailed Root Cause Analysis | September 5, 2024 

S2 |Office365 & Google Calendar | Reservations not syncing 

We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident. 

 

Description: 

Some TEEM customers experienced difficulties syncing calendar events with both Office365 and Google Calendar. Calendar events were not syncing automatically from the source calendar to TEEM, unless a manual Force Sync was initiated. In some cases, even using Force Sync did not resolve the issue, leading to confusion when using the reservation module.

While both Office365 and Google Calendar showed similar symptoms of calendar syncing failures, the underlying causes were different for each service and required separate solutions to resolve. 

 

Type of Event: 

Functionality Issue 

 

Services/Modules impacted: 

Calendar Service/ Production 

 

Timeline:

September 5, 2024, Reported MDT  

10:30 AM: Customers began to report the inability to sync their calendar events for Google (410 errors) and O365 (Missing Initialization errors). An internal Fire Alarm was raised, and all customers were notified that we are investigating the issue via the status page.  

September 6, 2024, Reported MDT  

11:53 AM: The engineering team has identified the issue and continues to work towards a resolution. Customers were notified that we have moved from investigating to an identified phase. In the meantime, engineers have implemented enhanced measures to mitigate the issue by running a script to manually sync calendars for all reporting customers, four times a day and should improve the reliability of calendar events till full resolution. 

September 10, 2024, Reported MDT  

9:34 AM: All customers were notified that a solution was implemented for the sync issue affecting Google (410 errors) and Office365 (Missing Initialization errors). To ensure stability and performance our engineering team is overseeing the process to confirm that the issue has been fully resolved. Monitoring will continue over the next several days.  

September 17, 2024, Reported MDT  

8:02 AM: All customers were notified that our engineering team has completed the necessary actions and verified that the service is now functioning normally. As no additional customers have reported specific issues in regard to Google (410 errors) and O365 (Missing Initialization errors). The status page was updated to a resolved state. 

 

Total Duration of Event: 

11 days, 21 hours, 32 minutes 

 

Office365 Root Cause: The issue occurred due to two concurrent requests attempting to refresh the Office 365 access token at the same time. This created a situation where the system, under certain conditions, returned a null token (None), which was then passed to the API client. As a result, calendar syncing was interrupted.

Office365 Remediation:

We have updated the system to ensure that, when multiple requests are made, the current access token is used if it’s valid. This prevents the token from being set to None and ensures the API client always receives a valid token.

Office 365 Preventative Measures: In addition to the fix, we’ve ensured that the system will no longer return a null token in any situation. We have also added logging to monitor the token refresh process closely, allowing us to better detect and resolve any future issues quickly. 

 

Google Calendar Root Cause: The system was making multiple attempts to delete events from Google Calendar, even when the event had already been deleted. This caused a 410 error, indicating the resource was no longer available. The issue occurred because the system did not verify whether an event still existed before attempting to delete it.

Google Calendar Remediation: 

  1. The system now checks if a Google Calendar event still exists before attempting to delete it, ensuring that deletion requests are only made when necessary. 
  2. We have introduced a locking mechanism to prevent duplicate or conflicting deletion requests from being made simultaneously. 

Google Calendar Preventative Measure: To prevent similar issues in the future, Google watchers will be updated using dedicated cron jobs, ensuring synchronization happens in a controlled and consistent manner. The lock mechanism will also ensure that API calls are handled sequentially and without conflict.

Posted Oct 18, 2024 - 15:36 MDT

Resolved
We are pleased to inform you that the issue with Calendar Events using Google & O365 are not syncing has been resolved. Our Engineering team has completed the necessary actions and verified that the service is now functioning normally.

A Root Cause Analysis (RCA) will be conducted to understand the incident in detail and will be made available on our Status Page within 10 days.

Thank you for your patience and cooperation throughout this process. If you have any further questions or concerns, please feel free to reach out.
Posted Sep 17, 2024 - 08:02 MDT
Update
Monitoring will continue till tomorrow, Tuesday, September 17th 9am CST. We appreciate your continued patience as we worked to resolve this issue.
Posted Sep 16, 2024 - 09:08 MDT
Update
We have implemented a solution for the sync issue affecting Google (410 errors) and O365 (Missing Initialization errors). We will be monitoring the situation in the next several days to ensure stability and performance. Our Engineering team is overseeing the process to confirm that the issue has been fully resolved.

The next update will be provided by Monday, 9am MST.
Posted Sep 13, 2024 - 09:15 MDT
Monitoring
We have implemented a solution for the sync issue affecting Google (410 errors) and O365 (Missing Initialization errors). We will be monitoring the situation in the next several days to ensure stability and performance. Our Engineering team is overseeing the process to confirm that the issue has been fully resolved.

The next update will be provided by Friday, 9am MST.
Posted Sep 10, 2024 - 09:34 MDT
Update
We appreciate your patience as our team is working diligently to resolve the issue with unpredictable calendar event synchronization for both O365 and Google users.

We have identified an issue which could be causing these symptoms and are actively working to address it. In the meantime, we have implemented enhanced measures to mitigate the issue, including increasing the frequency of forced calendar syncs to four times a day. This should help improve the reliability of your calendar events until the full resolution is in place.

Our team is dedicated to restoring normal calendar sync performance, and we will keep you updated with additional status updates as we continue to monitor and make progress towards a resolution. Our next update will be posted by Tuesday, Sept. 10 at 7am MDT. Should you experience any additional issues, please contact our support team for assistance. Thank you.
Posted Sep 06, 2024 - 11:53 MDT
Update
Our engineering team has implemented mitigations while we continue to investigate the root cause. If you continue to experience any issues, please raise a support case. Our next update will be 12pm MST.
Posted Sep 06, 2024 - 07:10 MDT
Update
We are continuing to investigate this issue.
Posted Sep 05, 2024 - 14:37 MDT
Update
Our engineering team continues to investigate the root cause for calendar events not syncing. As a workaround, all calendars have been forced resynced on back end services. Doing this has allowed calendar events to sync for customers who have been affected. We will continue to force resync until full resolution. For any questions, please contact our support team. Our next update will be provided, Friday, 09/06/2024 at 7am MST.
Posted Sep 05, 2024 - 14:37 MDT
Update
We are currently investigating an issue with Calendar Events using Google & O365 are not syncing, even when Force Re-Sync is used.

Our Engineering team is actively working to determine the root cause of the disruption and assess its impact.

We will provide our next update by 2:30 MST.
Posted Sep 05, 2024 - 10:57 MDT
Investigating
We are currently investigating an issue with Teem. We will update you when we have more information.
Posted Sep 05, 2024 - 10:31 MDT
This incident affected: Google Apps Calendar and Exchange Sync.