When you outsource production, you may risk productivity

The reality of the modern workforce is that we rely on a wealth of "<insert> as a service” approaches to provide the backbone of our business production environments. Remote working is here to stay and the cloud-based management of modern roaming devices is increasingly common. A cloud platform makes it easier to apply policies and updates even if the device is not directly connected to a corporate segment. The cloud is always on, and the device just needs an internet connection. Gone are the days of anti-virus updates being applied only when a client is connected to a network segment that has visibility of the internal update services. In this new world, updates are being pushed out quickly and automatically from a central cloud service, reducing the exposure time to any potential threat.

The security benefit of automatic, fast updates to client systems is obvious. The downside – which sometimes can be measured in terms of productivity - is less so. Productivity benefits are generally why businesses adopt cloud-centric models, that allow them to be agile. However, when we do this, we are at the mercy of these platforms. Global outages or incidents in large cloud platforms do not happen often, but when they do, especially at global scale, the ripples are felt everywhere. One provider’s mistake can create huge disruption to a global workforce. In contrast to legacy on-premise environments where updates were often tested thoroughly before being rolled out, updates are commonly rolled out to client devices with little or no testing. When this goes wrong, it can be hugely disruptive.

MO497128 – “ASRMAGEDDON”

Microsoft provides an advanced and robust mechanism for the remote deployment of core services such as application hardening, operating system updates and anti-virus solutions with technologies such as Intune and Windows Defender. In enterprise subscriptions, companies can take this a step further than historic anti-virus could via tooling such as the Attack Surface Reduction (ASR) rules. These rules are designed to provide a baseline of controls for hardening applications that attackers have and still do frequently abuse, for example, Microsoft Office applications and integrated macro code execution options.

A recent update to the Windows Defender ASR rule set deployed on Friday 13th 2023, created havoc for sysadmins everywhere. This update pushed new rules to the ASR collection that marked shortcut files and links as dangerous. Worse still, the update then proceeded to delete these shortcuts, making common program links for Outlook and Word vanish from the start menu. Even though it was just the links to the programs in the start menus and various other shortcut locations within Windows that were deleted, access to these programs was still removed from core locations of Windows. Not a wonderful experience for end users.

Users were presented with an alarming warning that stated something bad had happened. Warnings about activity being blocked and then the access to these programs being removed as an icon disappears is not the normal daily computing experience and so it is fair to say that internal IT teams must have been stretched considerably.

Most end users don’t need to know the deep inner workings of systems, and they are not inclined to. Computers just provide a function for communication and to allow them to do their respective jobs. The removal of the shortcuts to key business applications, such as Outlook, creates a problem which then gets routed by the end user to the relevant IT team. Now this individual cannot work until the IT department can provide a solution. They are unproductive until this is resolved. Scale this up to 100s or 1000s of users and productivity takes a significant hit.

It would be fair to assume that mission critical projects have been disrupted as user groups struggle to gain access to the tools they use to be productive. It is also fair to say that many hours will have been burned in IT departments today simply trying to keep people working. Significant losses in productivity.

Even within our own medium-sized business, there was a huge backlog of events and alerts that required triage. For a large corporate environment, the volume of noise generated from this issue must have been enormous and would inevitably have created a significant amount of additional work to deal with. This noise may even have created opportunities for a genuine attacker's activity to fall unnoticed inside this increased volume of alerts, or at least receive less scrutiny because of the chaos.

One solution was to set the rule “92e97fa1-2edf-4476-bdd6-9dd0b4dddc7b” into audit mode. This introduced a risk. Customers had to actively reduce the overall security posture of endpoints to prevent operational disruption as a result of the error. It prevented the deletion of shortcuts, but introduced a security risk that should have been carefully considered both at a technical and executive level.

It is understandable that an IT department needs to find a quick solution to putting out fires when the number of users impacted by an operational emergency is large. If those decisions then expose the company to another threat, which leads to data loss and risks that owners are not aware of, then the ramifications could be significant to the individuals and the company. These decisions certainly need careful consideration; the impact on productivity whilst the IT teams try and restore normality weighed against the reduction in defences.

Many heads of IT will have lost time last week trying to explain the what, why and how of ASR rule assignment to people that do not have the time or requirements to understand them, so that they can try and find a way through. There is no server they can reboot, nor rollback. This problem has been pushed onto them, and there is little they can do about it. The only consolation is that the updates on Friday 13th 2023 created a global problem and so the business, customers, and partners they work with daily may also have been affected and therefore have some empathy with the loss of productivity.

Productivity vs Security?

When people “can’t work”, theoretical risk of future attack becomes difficult to weigh against the immediate reality of users who cannot perform their functions unless you make a decision to reduce security. When risk decisions are made rapidly, in the face of operational pressure, to get people working, there may be no documented ‘back to normal’ plan, because there may be no clear route back from ‘turning off the security control’ that doesn’t risk more disruption that the business is not willing to countenance.

As a result of the error in Windows Defender ASR updates, it’s likely we may see more reluctance for this control to be deployed by those who are considering it. We may see more businesses choosing to audit behaviour rather than block behaviour, worried about a potential repetition. We may also see businesses who have placed ASR rules into ‘audit’ mode to get past this hurdle forget to turn them back into ‘block’ mode – or be reluctant to do this “just in case”…

As we continue to build enterprises on cloud platforms, our goal is to increase productivity by creating more freedom for the workforce. This freedom cannot be at the expense of security, and so we continue to need centralised tooling that can help ensure we are always reducing the attack surface. When an issue occurs at the provider level, this can have significant impact. It is easy to forget that we are consuming a service, and these platforms have become so ubiquitous and embedded into our environments that often, we forget that there are parts we do not control. This update has shown that mistakes can and will happen, and businesses should consider the overhead of work this creates when they do, the complex decisions that may be required as a result, and the productivity lost whilst all focus is on trying to restore the environment back to operation as normal. Importantly, when short-term decisions to relax controls are taken in an emergency, there should be a well-documented pathway to reimplement those controls once the emergency is over.

MO497128 – “ASRMAGEDDON”

Productivity vs Security?

Improve your security