Microsoft's multi-factor authentication service flakes out – again

Credit to Author: Gregg Keizer| Date: Tue, 27 Nov 2018 14:02:00 -0800

Just one day after Microsoft came clean with an explanation of a Nov. 19 outage that blocked users of Office 365 from logging into their accounts using Multi-Factor Authentication (MFA), today the service again went on the fritz.

“Starting at 14:25 UTC on 27 Nov 2018, customers using Multi-Factor Authentication (MFA) may experience intermittent issues signing into Azure resources, such as Azure Active Directory, when MFA is required by policy,” read the Azure status dashboard. Two and a half hours later, the dashboard reported that after resolving a problem with an earlier DNS (Domain Name Service) issue, engineers rebooted the services. “They observed a decrease in the failure rate after the reboot cycles,” the dashboard concluded.

Tuesday’s outage affected users in the three broad geographic areas defined by the dashboard: the Americas, Europe and Asia Pacific.

The latest MFA problem came the day after Microsoft described last week’s 14-hour failure in an after action-style report posted to the Azure dashboard. In a long report – over 1,150 words – Microsoft identified three root causes, detailed the failures and steps engineers took to recover the service, and spelled out steps it plans to take over the next two-plus months to review and update its processes and procedures.

“We sincerely apologize for the impact to affected customers,” Microsoft said near the report’s end.

Calling the pair of outages “troubling,” analyst Wes Miller of Directions on Microsoft pointed out that a service like Azure Active Directory and its MFA “has to be designed to be incredibly robust.”

He was encouraged, he continued, by the after-action report’s tone.

“My hope is that the [Azure] team has the right perspective. It looks like they do,” he said of the outlined steps to reevaluate service update deployments and find ways to restore service faster. Engineers won’t rush to judge the problem and propose a fix, Miller said, one way to easily make things worse. And unlike the Windows 10 group, the Azure team has been forthcoming about causes and reactions.

Microsoft promised that an accounting of Tuesday’s outage would be posted on the Azure dashboard within 72 hours.

http://www.computerworld.com/category/security/index.rss