Teem by Eptura detailed Root Cause Analysis | 3/11/2024
S2 – O365 SSO Login Issues
We are truly grateful for your continued support and loyalty. We value your feedback and appreciate your patience as we worked to resolve this incident.
Description:
(The Incident is logged in MST)
On Monday on Monday March 11th Internal Teem members and customers started experiencing issues with logging into the Teem application via O365. At around 10:15 PM a fire alarm is pulled by Jared Collins Teem Support Manager
Type of Event:
Outage
Services/Modules impacted:
SSO/Logging
Timeline:
The timeline is posted in MST.
We received issues of logging in at 5:25 PM but it was still logging perfectly fine after a few tries. We confirmed logging worked over the next few hours and kept a close eye on things. At 10:06 PM A customer reaches out that logging is having issues, internally we are now having issues as well. 10:08 PM a Jira is started to be created for our Engineering team to jump on the issue. 10:15 PM a Fire Alarm is pulled. 10:21 PM Status page is updated with the status of issue at hand. 10:23 PM Engineering confirms they are working on the issue. 11:54 PM Status page is updated once more for the evening. 8:00 AM status page is updated. 12:00 PM Status page is updated again that the issue is currently still at hand. 2:37 PM Engineering has found a fix and implemented said fix to environment. 3:05 PM customers confirming that the issue is resolved. 4 PM the status page is taken down.
Total Duration of Event:
16 Hours
Root Cause:
The cause of the issue at hand was an outdated token living on a server that is older. The token was re-established and the issue was then fixed.
Remediation:
We have put Fire Alarms in to notify the team of SSO expiration a month before it happens so we can update the token and ensure that SSO has no issues going forward.
Preventative Action:
Having a system in place to notify of token expiration.