Input Validation and the problems with Sanitisation

Tue, 30th May 2023

By Jack Misiura, Application Security Manager, The Missing Link

In the realm of data security, it's crucial to ensure the integrity and reliability of the information. A very common application security practice is to validate and sanitise the supplied data. This typically involves verifying data inputs to ensure they do not contain potentially malicious characters. This practice works well until it does not.

It is crucial for developers to understand the nuances of input validation and discuss why sanitisation can pose risks to your systems and applications. Further, an essential tool on any developer’s belt is knowing how to implement input validation effectively and securely.

Understanding data validation and sanitisation

Input validation is the process of ensuring the data doesn’t contain any malicious information. It may also involve verifying the data types and formats and enforcing business rules to maintain integrity.

It’s important to understand the scope of this validation. It is not enough to simply check user-supplied inputs as they enter the application. Proper data validation encompasses the entire journey, including its transmission between systems. When data is shared across various platforms, risks arise. For instance, one system may transform the data in a certain way while another processes it. If no validation occurs in the processing system, malicious information may be presented due to the transformation. Therefore, comprehensive data validation becomes vital to prevent potential attacks. Each workflow or method processing the data should be performing its own validation.

There are two ways to validate data. One is to look for forbidden characters. The danger of this method is simple – there are many ways attackers can encode forbidden characters to be hidden, meaning this approach will miss them.

The other (correct) way is to look for allowed characters only. Most importantly, consider that the length of input should also be limited. Malicious payloads or exploits rely on having enough characters to perform their actions. Limit the characters, and you may be able to prevent exploitation.

On the other hand, data sanitisation is typically the process of cleansing data inputs. This process may involve removing potentially harmful or unwanted elements. It may even involve applying data transformation techniques to prevent code injection, cross-site scripting (XSS), and other security threats. From a pure user experience (UX) perspective, data sanitisation may attempt to provide a more seamless UX when incorrect information is entered into a system the user is interacting with.

Can you spot the problem with sanitisation already?

The pitfall of the traditional application security message is “validate and sanitise”

While validation aims to ensure data integrity, sanitisation can lead to unintended consequences. It is essential to consider the context and purpose behind each data entry. Blindly removing certain characters or elements without understanding their significance may result in erroneous data interpretation or even the introduction of security vulnerabilities.

Let us review how this happens:

1. Loss of user intent: Sanitisation can inadvertently distort user input, leading to confusion and incorrect data interpretation. Consider scenarios where names with quotation marks or special characters are mistakenly altered, resulting in misidentification or transaction errors – for example, in the case of a bank transfer. No matter how advanced the sanitisation is, a developer is simply not inside the user’s mind and cannot know the actual intention behind the input.

2. Exploitable Weakness: Sanitisation routines may create loopholes for attackers. By strategically crafting inputs that rely on the sanitisation measures, these inputs may appear random, corrupted pieces of information at first but, upon transformation, become working exploits. These exploits can then compromise system security.

3. Evasion of defences: Sanitisation might even bypass security measures designed to detect specific attack patterns. Attackers can cleverly manipulate inputs to circumvent defences, enabling the injection of harmful payloads that would otherwise be identified and prevented.

Striking a balance when it comes to proper validation techniques

The most secure system is one that cannot be interacted with.

The reality of modern software is that systems need users. Users need to enter information. The problem with input validation is potentially rejecting legitimate user input, leading to application usability problems.

As with anything, a balance must be maintained. Characters such as single quotation marks may be used in various exploits, but they may also form a part of someone’s name. This is where length validation becomes even more crucial. Couple this with proper secure coding guidelines and principles, and the risk of an exploit is reduced.

Some of my biggest practical tips for input data validation include:

Understand the purpose and context of data inputs to determine the appropriate validation rules. Different fields may have different requirements, so always customise the validation rules accordingly. A good example is someone’s name. Having names made up of 1000+ arbitrary characters is unnecessary. Consider what names may consist of and how long they should be.
Instead of rejecting forbidden characters or patterns, use an allowlist approach. Define and enforce allowed characters and patterns specific to your system. This helps reduce the risk of inadvertently removing or altering valid data.
Validate the length of data inputs! Set appropriate maximum input lengths based on the requirements of each field. Exploits rely on enough characters to perform malicious actions. Limit those characters, and the payload may fail to trigger.
Most importantly, when data validation fails, simply alert the user that invalid information was entered. Do not render the entered input or try to process it!

While data validation is vital for safeguarding systems and applications, it's crucial to strike the right balance.

Blindly removing characters or patterns without considering their context can introduce unintended consequences, compromising usability and opening doors to potential security breaches. It is time to leave sanitisation in the past and focus only on correct input validation instead.

That said, the advice above will help ensure your data integrity while minimising risks and maintaining a robust security posture.

Remember, validation is not about simply rejecting data; it's about understanding the data, preserving user intent, and keeping your systems secure. By implementing effective validation practices, you can help protect your organisation's sensitive information and increase the trust of your users and stakeholders.

More information