How Does an Antivirus Software Works?

Illustration of Safety and private data protection

Within the World Wide Web, there are only victims and potential cybercrime victims. Right as we speak, millions of computer viruses ramble into this huge network. Say your computer hasn’t been infected yet? Unless you haven’t already made the choice of a smart antivirus, it’s not a personal merit to rejoice for. In fact, without antivirus protection, not getting infected is mostly a matter of luck.

Then again, just like in many other fields, education for prevention can make a huge difference. So, wouldn’t you like to know more about these threats? Let us start with the basics of what is a computer virus and how antivirus software works to detect it…

First things first, what is a virus? What will the antivirus look for?

Viruses are applications like many others, designed to work on specific devices. Obviously, not any compiled application is a virus. And so, the difference between useful and harmful compiled applications is given by their purpose.

Viruses are meant to cause damage. That damage can mean anything from stealing your information to deleting your data, crashing your computer, or asking you ransomware to regain access to the infected device.

The fact that a virus is a compiled app can be both a good and a bad thing:

  • It is a bad thing because it can easily pass as a good app and trick users into downloading or accessing it.
  • But it is also a good thing because, well… compiled applications are made of bits.
    • And bits create footprints or signatures that make the app easier to recognize as a virus, once it was first reported as such.

 

In other words, a virus is an application that compiles into the same sequence of bits, every single time it runs, generating the same negative impact. This sequence reported by antivirus software to have a harmful impact on a device is seen as a virus signature.

Antivirus vendors will blacklist and store that sequence as reference for future comparisons. From that moment on, whenever their software will encounter it during any kind of scanning, it will recognize it as a virus and react accordingly.

Reacting accordingly is a vague term. There are so many and different tools that antivirus labs rely on, when it comes to dissembling viruses. Normally, they all move the suspicious file in quarantine, isolating it from the system and preventing it from running its malicious code. Depending on the antivirus program’s settings, it can choose to delete the file right away. Or it can run it and test it in sandboxes, in the cloud, from where it cannot affect your device.

So, is it really that easy for your antivirus to spot a virus?

Now, if it’s so easy to “spread the word” and let other devices know what signatures to block… How comes we still often feel overwhelmed with these attacks?

Obviously, it is because while antivirus developers are working hard to collect useful information and share it with their entire pool of users, so are the virus developers. Anyone with the knowledge to program computer software can also create computer viruses. And they don’t even need to create a virus from scratch. Suffices to take one of those virus signatures, alter its code with new, custom specifications, and they can compile and distribute it as a new virus.

The new virus will have a certain code sequence in common with the old virus. But the signature won’t perfectly match and, therefore, it will be reported as a different virus. This is the case when a particular virus, powerful enough to frighten the entire online community, ends up having several different names – it was altered by other virus developers and now has different versions running online.

How can your antivirus stay up to date with all these changes?

Well, the antivirus in itself is just a compiled app that knows how to scan other compiled apps and match what it finds with a database. That database contains virus signatures that, as already explained, change rapidly, resulting in new threats.

Basically, your antivirus doesn’t stay up to date with all these changes. But its developer will do. By collecting all the information that it can get, the developer will update the antivirus with so-called definition files. It will then notify its users that a new definition file is available. And by installing the update, the antivirus software will benefit from a new version of database with virus signatures.

In other words, if you ignore updating the definition files, you leave your computer exposed. The antivirus will continue to scan the executable files. But if it will encounter a virus with a signature modified from the version currently stored on its database, it will not be able to recognize it.

For this reason, definition files should be allowed to download automatically. And the antivirus software will have the chance to access new, updated definition files once a day, sometimes even more often than that.

Is that everything that antivirus software does?

Needless to say, the antivirus will always have to match the file it analyzes against the signatures from its most recent definition file. By always, we mean every time you are launching an app or an executable file.

In those short (or long) fractions of a second when you’re waiting for the app to launch, the antivirus is doing all the hard work of comparing code sequences. Hence the complaints that using antivirus software can slow down your computer… And the continuous struggle of antivirus developers to create software with as little impact on a computer’s system resources as possible.

Aside from the code comparison, antivirus software can also look into a program’s behavior, doing a so-called heuristic evaluation.To sum up, the basic scanning process of any antivirus software will focus on three types of detection mechanisms:

  1. Specific detection
  2. Generic detection
  3. Heuristic detection

 

The specific detection will try to identify known malware by looking for a specific, quite exact set of characteristics. Whereas the generic detection will seek for variations of the known malware code, trying to identify new viruses that have been developed from older versions.

Heuristic detection is different from behavioral detection

Heuristics walk the extra mile. Instead of simply comparing pieces of code, it relies on rules and algorithms. And it evaluates commands that can indicate malicious intentions from a certain app or program. Because of that, it can spot a new or unknown malware even when the antivirus lacks the latest virus definitions.

What kind of suspicious activities performed by viruses can be spotted with the help of heuristic detection? For instance, when the virus is trying to access all of the executable files on your computer, inserting a copy of the original program into their code. That way, it will increase the risks of infecting your device (any executable file on the PC will become a source of infection) and it will make it even harder for the antivirus to completely remove it from the device.

Heuristic-based detection usually pairs with signature-based detection and tends to make an impact especially on the prevention side. The behavioral based detection, on the other hand, will look at what a program or an app does while actually running on the PC. This is hugely different from looking at what that program does in a virtual environment.

The problem with heuristics, however, is that it leaves so much room for mistakes. Sometimes, it can prove too much of an aggressive measure. And so, it can lead to false positives, where it flags a harmless program as an unknown type of virus.

Does it come down to signatures, behaviors, and executable files?

It would have been nice, but no. The truth is that there are many other types of online threats. And not all of them will come through an executable file that you personally launch. Browsers and plugins, the operating system itself, your email app and not only… It can all easily turn into access points for viruses to sneak in.

So, antivirus software is either part of a security suite with several other layers of protection included; or it comes with extra features in itself, doing a lot more than the actual scanning of every file you open.

Antivirus software can fight malware with different detection techniques. We have already seen the signature and the heuristic-based detection mechanisms. And we have mentioned the behavioral-based detection.

As suggested, this has more to do with an antivirus’ intrusion detection mechanisms. It detects the potentially harmful characteristics of malware while it actually executes on the device, meaning while it runs its malware actions.

On top of everything else above discussed, there is also the sandbox detection and a series of data mining techniques.

Sandboxes are virtual environments

Specifically built for testing malicious files outside of the operating system, sandboxes are the next level after heuristic detection.

Heuristic detection looks for features or actions and behaviors that are normally associated with known threats.

Sandboxing is all about letting the malicious app run in that dedicated environment and record its behavior.

Sandboxing takes more time but it is also more accurate and it is often inspected, afterward, by a malware analyst. With its help, the analysis will not only determine if the suspicious file really is malicious or not, but also exactly what it does, if it really ismalicious.

In a nutshell, sandboxing opens the file in a safe environment, lets it run, and sees exactly what it would have done to your computer if it had the chance to run in there.

Data mining and the first steps towards machine learning

Just like the name suggests, data mining is a sophisticated process of selecting a huge amount of data and, equally important, sifting through it in search of pertinent, specific information.

Knowing how to interpret the information extracted from those large sets of data is crucial, therefore are different options involved in data analysis. Machine learning techniques represent one of the most recent and complex options of data analysis, making use of complicated algorithms.

In fact, data mining involves applying an overwhelming suite of statistical and machine learning algorithms, on a specific set of features extracted from both malicious and clean programs. More about that, a bit later in this article.

The main antivirus scan types and detection mechanisms

We’ve seen what the antivirus is generally looking for. But it would probably help to know, in advance, what kind of options you have, as an antivirus software user.

Scanning is a process that can be executed either on demand or by default. Some users will disable automatic scanning, unhappy that allowing the antivirus to run its scans in the background will slow down their computers. Others will let the antivirus work as it sees fit and that’s probably a very good idea.

Long story short, there’s on-access scanning and full system scanning. Depending on the features that the antivirus software comes with, there are also options to create custom scans, to scan only certain partitions, certain folders, or even certain files. You can do that periodically or, as suggested, on demand.

Then again, scanning is just one of the many security layers that your antivirus relies on. More specifically, it is a detection mechanism, one of the four main detection mechanisms that antivirus software normally provides:

  1. Scanning – implies simply searching for specific strings in the analyzed files, strings that are pre-defined virus signatures; scanning may report results based either on exact matching or variants of a virus-signature.
  2. Activity monitoring – as one of the latest trends in virus research, this one involves monitoring a file execution and detecting any trace of malicious behavior.
  3. Integrity checking – this one starts with creating a cryptographic checksum for every single file stored on the computer and returning to it periodically, to check for any variation that occurred in that checksum in the meantime; these variations can help to detect changes caused by viruses.
  4. Data mining – as mentioned, it is a complex process that works with both statistical and machine learning algorithms.

On-access scanning

With this scan type, the antivirus runs in the background, checking every single file you open. It checks it by comparing its code with the database of signatures, to see if it matches the ones of known viruses, worms or other types of malware.

This type of scanning onaccess doesn’t come down to executable files only. It can also look into archive files that may hide a compressed virus; or into office documents that may hide malicious macros; or into any type of file that you download, which will be scanned automatically, without the antivirus waiting for you to open that file.

On-access scanning is perhaps the most important type of antivirus scanning because it has the ability to protect a PC before it gets infected. Most viruses will enter the device and wait for you to launch it before it starts acting.

Once you release it, it becomes significantly more difficult to remove it. And even if you do, or your antivirus says it has removed it, there is no certainty that you have completely removed it. Therefore, catching a virus before you get to launch the app that contains the malicious code is very important.

While one may have the option to disable on-access scanning with the purpose of reducing the impact that the antivirus has on the system’s resources, it is certainly not a good idea to make use of that option.

Full system scanning

The full-system scans are available with most antivirus software. And they usually come as an option to schedule or an automatic action. When automatic, the antivirus software will schedule it like once a week, at an hour when you normally don’t need to use your computer (it will notify you about that).

But as long as the on-access scanning is active, there are only a few instances when one should spend time with scanning the entire disk. Such instances include but aren’t necessarily limited to the following situations:

  • When you have just installed new antivirus software and you want to run a full scan to see if there aren’t dormant viruses that the previous antivirus missed;
  • When you know for a fact that the device has been infected, you don’t want to reinstall the operating system, and choose instead to transfer the hard drive on another PC and have it scanned in there with a full system scan;
  • When you have disabled the automatic full-system scans that the antivirus software will schedule periodically.

The future of how antivirus software works

With the simple mentioning of the machine learning algorithms, we have entered the fascinating field of artificial intelligence antivirus. Pretty much everything we have discussed in this article so far targeted the way that traditional antivirus software works.

As stated, traditional antivirus software relies on data signatures and pattern analysis. It’s all a never-ending attempt to comparing everything that happens on your computer with previous instances where malicious activities were reported.

In other words, antivirus software knows how viruses look and what they do on a computer. And whenever it detects an activity that has to do with those virus-specific features and behaviors, it jumps in and blocks it.

The traditional malware recognition modules decide if an app is a threat after collecting and analyzing specific data about it. Data can be collected:

  • In the pre-execution phase – a phase where it just looks at the app and gathers details such as file format descriptions, code descriptions, statistics of binary data, text strings and other data extracted through code emulation;
  • Or in the post-execution phase – a phase where it analyses what happens after the app was active inside the system, after seeing its behavior and consequences firsthand.

 

This would work fine for the less challenging malware apps, but we all know that we are facing more and more advanced malware versions and malware attacks. To respond to it all accordingly, artificial intelligence antivirus software is being developed. And through it all, the anti-malware companies have turned to machine learning, increasing their malware detection rates and malware classification abilities.

The differences between Machine Learning and Artificial Intelligence

Machine learning (ML) and Artificial Intelligence (AI) are two terms often interchanged, even though, at their essence, they are different. To put it simple, machine learning is just a mean for the goal of achieving artificial intelligence. Because artificial intelligence defines programs that can execute tasks with human intelligence characteristics… Whereas machine learning defines a set of methods that would give an antivirus the ability to learn without being explicitly programmed.

Machine learning algorithms can look at large sets of data, and then discover and formalize the principles underlying that data. In other words, the algorithm should be able to “reason” properties of malicious samples even if they were previously unseen.

Applied specifically to malware detection, machine learning can consider any new file that you are trying to access on your computer as a previously unseen sample. The hidden property it discovers in it may be malware or benign. But it should be able to reason if it really is malware or not, based on a model that it deducts through a set of principles underlying the data properties.

Most importantly, machine learning is not just a single method but rather a range of approaches that will lead to a solution.

Given the complexity of this scanning method, artificial intelligence antivirus is raising the stake among the villains who seek to develop malware. The more complex the scanning and identification methods become, the harder they will have to work to create malware that are more difficult to detect.

It is, after all, a continuous race and antivirus software based on artificial intelligence simply keeps us in the race.