In search of JS data masker. Part 1: issues
Anton Golub
Posted on June 22, 2020
The problem of sensitive data masking is solved in various ways. Therefore, it is interesting not so much to do a comparison of these solutions, but to think about what aspects are relevant today. Criteria, considerations, limitations and so on.
Suspense
The most maskers use analyzers to separate entities that should be hidden.
They examine entry names (like "password"
, "token"
, "secret"
") or data formats (like card PANs). But this heuristic is ambiguous and very fragile. It’s impossible to cover all cases fully automatically. Sometimes the masking rule can only be defined in the business logic context.
class UserProfileDto {
personalData: {} // sensitive data
personalSettings: {} // not sensitive data
}
Sometimes, the stage in which we determine the need for data masking, and the stage of data output are ofter located in directly unrelated layers.
Vulnerability
Is it possible to output sensitive data to the console? Definitely, yes. We use tons of frameworks, utility libraries, and we cannot completely control them.
class CredentialsDto {
constructor(username: string, password: string) {
this.username = username
this.password = password
}
}
For example, creds go to dto
, dto is passed some request provider (db, http), then request fails with unexpected state and prints all the invocation context data to console.error
.
The obvious solution is simply to define custom valueOf
and toString
methods. But immediately various side effects arise. For example valueOf
can be used for comparison operations in some util. Moreover, console.log()
does debug magic and ignore these implementations. Maybe mark field as non-enumerable? Ok, we've tricked default console.logger
, but broke any serializer which iterates through for ... in
.
Override native console.log
? Maybe. But what if a module uses a sandbox inside and operates with own console
instance? Or stores console methods in closure? In short, any injections entails technical difficulties.
Coupling
It must be accepted that masking and logging (any output) are different areas of responsibility.
The masker may be a part of logging pipeline, but it's not required. We could not try to modify the target near the output point, but create a masked companion entity in the business layer and just bind them through some shared WeakMap
.
// Logger util layer
const maskedStore = new WeakMap()
const logger = (...args) =>
console.log(...args.map(value =>
maskedStore.has(value)
? maskedStore(value)
: value
))
// Business logic
const a = {smthToHide: 'sensitive data', foo: 'bar'}
maskedStore.set(a, {...a, smthToHide: '***'})
Reflect.metadata
can also be used for the same purpose. Or even cls-context.
Interception
Reflecting on what the masker does, it is obvious that everything comes to two fundamental things: search and replace data. Schema-based approach applicable if we know the essence of masked data, if we control the point where its created. In practice, we use frameworks that manage internal layers of data independently and uncontrollable from the outside.
On very lucky, there is a way to inject your custom masking logger. Often, for greater reliability, we have to hang a hook on stdout/stderr
or override native console
.
Performance
Different masking cases require different detection approaches: regexps, functions, binary operations (PAN checksums). Taking the scale of these operations, masking can seriously affect performance. And these features should be investigated by benchmarks.
Distortion
Masking does not always mean a complete replacement for content. It is important to maintain a balance between security and perception. For clarity, imagine user payments history:
Recipient: *** (personal data)
Sum: $25.00
Paymethod: credit card *** (sensitive data)
With a comparable level of security, this might be in more readable form.
Recipient: J.S***d
Sum: $25.00
Paymethod: credit card 4256 **** **** 3770
So modifiers should provide the minimum necessary, but not the maximum possible level of data distortion required for a specific context.
Chain of responsibility
The reasoning above suggests the following IMasker
contract.
interface IMasker {
detect: (target: any) => any,
modify: (target: any, detected: any[]) => any
}
Simple, clear and easy to compose, but it also involves some limitations. Here's the case:
{
token: {
type: 'bearer',
value: 'some string'
}
}
What should be the final result?
1) token: '***'
2) token: '*** (object)'
3) token: {type: '***', value: '***'}}
4) token: {type: 'bearer', value: '***'}}
If we strive for option 4, we need to place additional logic somewhere, that transcends the liability of detect
and modify
. Let it be in a controller.
interface IMasker {
(target: any, next: IMasker): any
}
Strategies
It is important to perform masking clearly. The main reason is that masking may be a subject of audit. For example, if you just replace PAN with random numbers, it will still raise questions from the PSI DSS.
Canonical masking symbol is * (asterisk), less commonly applied — X char, even less often — • (bullet, for interactive elements like input fields).
A sequence of three characters or more indicates the masking.
The easiest way to hide is to replace content. foobar
becomes ***
, some long string
, right, equals ***
after masking. This is plain masking.
If there's a need to keep the length of the origin text, we could replace each symbol as if crossing out. When another string
turns into ******* ******
that means strike masking was applied.
Usually spaces are not masked. NOTE This type of symbol mapping must not be applied to passwords. **** looks like an invitation for brute force.
For some types of data, it's important to keep the format specificity. In this case, the partial replacement will affect only a certain fragment.
Examples: phone number +7 *** *** 23 50
, PAN 5310 **** **** 9668
.
Parsing
Masking is required for various input types. Depending on structure, they pose simple or complex task.
-
json is pretty easy to iterate through
recursive map
/deepMap
. - xml requires resource-intensive parsing. Potentially contains sensitive data in text nodes or attributes.
- url may contain credentials in path or query parts. Access token is easy to confuse with ID, because both may be UUIDs.
- custom thrift models attaches sensitive data flags.
- pan requires checksum verification.
The list goes on. These features should be implemented in such a way that the masker does not become a parser. They are related, but not identical.
Directives
The next stage of abstraction is the transition from the direct masked object creation and binding to the delegation of this function to a separate subsystem. This feature requires a declarative contract instructions or masking directives which can be interpreted.
By analogy with how json-schema, we'll be able to use various implementations in the future. Depend upon abstractions, not concretions.
It is advisable to inherit well-known contract as a basis.
interface IMaskerDirective {
type: string // masking type
value?: any // replacement entity reference
options?: any // options for current `type` of masker
description?: string // optional comment
properties?: Record<string, IMaskerDirective> // Directives for nested props
definitions?: Record<string, IMaskerDirective>,
$ref?: string
}
Asynchronicity
There're several JS engines, which support synchronous (Rhino, Nashorn) and asynchronous (V8, Chakra) flow. To be honest, today V8 completely dominates among them. Therefore, it is advisable to follow async paradigm out of box especially if masking is resource intensive.
Usually sync/async versions of api are presented by different functions: fs.readFile
and fs.readFileSync
, execa
/execa.sync
, etc.
interface IMasker {
(target: any, next: IMasker): Promise<any>
sync?: (target: any, next: IMasker) => any
}
export {
masker,
maskerSync
}
Extensibility
A long-term solution must constantly adapt to new requirements. If the concept of continuous modification lays down in original design, the improvement process will be more efficient. How to do it simply? The plugins.
Composability
Although high-level maskers reuse part of the functionality of basic maskers, it’s better to avoid direct dependencies.
The solution can be based on DI/IoC-container system / some shared registry. Each custom masker should be declared as provider and be available by alias (interface / name).
In modern JS the context providers is becoming popular (inversify, awilix, nestjs di), but not yet widespread enough.
Let there be a registry of plugins at least.
interface MaskerRegistry {
add(type: string, masker: IMasker): void
remove(type: string, masker: IMasker): boolean
}
Ready-made solutions
I don't dare to say that there's no library suitable for enterprise. Unfortunately, I could not find something mature, that can be taken as a basis for refinement.
- https://www.google.com/search?q=js+sensitive+data
- https://www.google.com/search?q=js+data+masking
- https://www.npmjs.com/search?q=sensitive%20data
- https://www.npmjs.com/search?q=data%20masking
Well-known projects implement their own maskers where necessary. For example, semantic-release/lib/hide-sensitive.js
module.exports = (env) => {
const toReplace = Object.keys(env).filter((envVar) => {
return /token|password|credential|secret|private/i.test(envVar) && size(env[envVar].trim()) >= SECRET_MIN_SIZE;
});
const regexp = new RegExp(toReplace.map((envVar) => escapeRegExp(env[envVar])).join('|'), 'g');
return (output) =>
output && isString(output) && toReplace.length > 0 ? output.toString().replace(regexp, SECRET_REPLACEMENT) : output;
};
Posted on June 22, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.