takutoy
Posted on December 24, 2022
Semgrep provides a large number of rules, but sometimes you may want to customize a rule or create a new one.
For example, when a vulnerability is found in a product developed by the organization, we want to
- check if similar vulnerabilities exist in other products
- detect similar vulnerabilities in the future
In such cases, rules for finding vulnerabilities will help maintain the security of the product.
This article will guide you through the process of creating a rule to detect DOM-Based XSS and help you understand the features and options required to create a rule.
Prerequisite
This tutorial uses semgrep 1.2.1.
$ semgrep --version
1.2.1
Search for similar rules
Semgrep Registry provides rules created by r2c, developer of Semgrep, and the community.
There may already be a rule that has been created that you are trying to create, or there may be similar rules, so search first.
In this tutorial, we want to create a rule to detect DOM-Based XSS, so we will search javascript rule for dom xss.
The search found the rule javascript.browser.security.dom-based-xss.dom-based-xss
, so we will create a rule based on this.
part of javascript.browser.security.dom-based-xss.dom-based-xss
:
pattern-either:
- pattern: document.write(<... document.location.$W ...>)
- pattern: document.write(<... location.$W ...>)
This rule seems to only detect document.write()
.
Create test cases
Before you start writing rules, you should first create test cases.
- test code you want to detect (unsafe)
- test code you do not want to detect (safe)
The test case is as follows:
dom-based-xss.js
const qs = window.location.search;
const hash = window.location.hash;
// ok
document.write("<p>ok</p>");
// unsafe
document.write(qs);
document.write(hash);
Test cases do not need to cover all patterns from the beginning. You can add test cases as you create rules.
By the way, it still does not detect correctly when executed with the current rules.
$ semgrep --config dom-based-xss.yaml dom-based-xss.js
Scanning 1 file.
Ran 1 rule on 1 file: 0 findings.
Taint tracking
Injection attacks such as XSS are characterized by source and sink. The place where the attack code is placed is called the source, and the place where the attack code is executed is called the sink.
Semgrep has a feature called taint tracking that analyzes whether an untrusted source reaches a vulnerable sink.
Taint tracking may reduce false negatives and false positives.
Taint mode
To use taint tacking, set mode to taint and write pattan-sources
and pattern-sinks
.
dom-based-xss.yaml
rules:
- id: dom-based-xss
mode: taint
message: dom-xss
languages:
- javascript
- typescript
severity: ERROR
pattern-sources:
- pattern: window.location
pattern-sinks:
- pattern-either:
- pattern: document.write(...)
This rule has the following settings:
mode: taint
-
pattan-sources
towindow.location
, the source of DOM-XSS. -
pattan-sinks
todocument.write(...)
, the sink of DOM-XSS
In this rule, The taint tracking analyzes the following
-
const qs = window.location.search;
-
window.location
is tainted -
window.location.search
is tainted too - constant
qs
is also tainted
-
-
document.write(qs);
- tainted
qs
is used in vulnerable sink
- tainted
Run this rule on the previous test case and you will see that it is now detectable.
$ semgrep --config dom-based-xss.yaml dom-based-xss.js
Scanning 1 file.
Findings:
dom-based-xss2.js
dom-based-xss
dom-xss
5┆ document.write(qs);
⋮┆----------------------------------------
6┆ document.write(hash);
Ran 1 rule on 1 file: 2 findings
Enhance Source and Sink
DOM-XSS source can be other than window.location
and sink can be other than document.write()
.
For example,
Introducing DOM Invader: DOM XSS just got a whole lot easier to find : PortSwigger presented 11 sources and 86 sinks.
Since the article would be too long if we tried to cover all sources and sinks, 5 sources and sinks are selected.
sources
- location
- location.href
- location.hash
- location.search
- document.URL
sinks
- document.write()
- document.writeln()
- jQuery.html()
- element.innerHTML
- location.href
These sources and sinks are written in the rules as follows.
dom-based-xss.yaml
rules:
- id: dom-based-xss
mode: taint
message: dom-xss
languages:
- javascript
- typescript
severity: ERROR
pattern-sources:
- pattern-either:
- pattern: location
- pattern: window.location
- pattern: document.location
- pattern: document.URL
pattern-sinks:
- pattern-either:
- pattern: document.write($PAYLOAD)
- pattern: document.writeln($PAYLOAD)
- pattern: $JQ.html($PAYLOAD)
- pattern: $ELEMENT.innerHTML = $PAYLOAD
- pattern: location.href = $PAYLOAD
Notes.
- Once
location
is set,location.href
location.hash
location.search
is also automatically set to source. - The
location
is added becausewindow.location
anddocument.location
are also available.
Add test cases to match the addition of the sources and sinks.
dom-based-xss.js
const qs = window.location.search;
const hash = document.location.hash;
const query = location.search;
const url = document.URL;
// ok
document.write("<p>ok</p>");
// unsafe
document.write("unsafe" + qs);
document.writeln("unsafe" + hash);
// unsafe
$("div.test").html(query)
// unsafe
const e1 = document.createElement('p');
e1.innerHTML = url;
// unsafe
location.href = qs
After adding the test cases, let's run the rule; there are 5 unsafe cases, so 5 should be detected.
$ semgrep --config dom-based-xss.yaml dom-based-xss.js
Scanning 1 file.
Findings:
dom-based-xss.js
dom-based-xss
dom-xss
10┆ document.write("unsafe" + qs);
⋮┆----------------------------------------
11┆ document.writeln("unsafe" + hash);
⋮┆----------------------------------------
14┆ $("div.test").html(query)
⋮┆----------------------------------------
18┆ e1.innerHTML = url;
⋮┆----------------------------------------
21┆ location.href = qs
Ran 1 rule on 1 file: 5 findings.
Properly detected!
Propagator
In taint tracking, tracking may be interrupted when some functions are used.
For example, the following cases will result in DOM-XSS, but will not be detected by Semgrep.
// unsafe
arr = [];
arr.push(url);
document.write(arr.join(' '));
This is because Semgrep does not know that arr
is tainted by push(url)
. This is where propagators
come in.
The propagators
are set as follows
pattern-propagators:
- pattern: $ARR.push($E)
from: $E
to: $ARR
This will also detect the previous test case. In addition to push
, shift
and unshift
need to be set as propagators well.
Sanitizer
If a variable is properly sanitized, DOM-XSS will not occur.
For example, if you sanitize using DOMPurify, DOM-XSS will not occur. But the current rules will detect it.
// ok
const sanitized = DOMPurify.sanitize(qs)
document.write(sanitized);
So, setting sanitizers
will break the tracking assuming the variable is sanitized.
pattern-sanitizers:
- pattern: DOMPurify.sanitize(...)
This will prevent the previous test case from being detected.
Summary of taint tracking
We have now created a rule to detect DOM-Based XSS using taint mode.
For taint mode, we used the following settings
- mode: taint
- pattern-sources
- pattern-sinks
- pattern-propagators
- pattern-sanitizers
The completed YAML file and test code are as follows
dom-based-xss.yaml
rules:
- id: dom-based-xss
mode: taint
message: dom-xss
languages:
- javascript
- typescript
severity: ERROR
pattern-sources:
- pattern-either:
- pattern: location
- pattern: window.location
- pattern: document.location
- pattern: document.URL
pattern-sinks:
- pattern-either:
- pattern: document.write($PAYLOAD)
- pattern: document.writeln($PAYLOAD)
- pattern: $JQ.html($PAYLOAD)
- pattern: $ELEMENT.innerHTML = $PAYLOAD
- pattern: location.href = $PAYLOAD
pattern-propagators:
- pattern: $ARR.push($E)
from: $E
to: $ARR
pattern-sanitizers:
- pattern: DOMPurify.sanitize(...)
dom-based-xss.js
const qs = window.location.search;
const hash = document.location.hash;
const query = location.search;
const url = document.URL;
// ok
document.write("<p>ok</p>");
// unsafe
document.write("unsafe" + qs);
document.writeln("unsafe" + hash);
// unsafe
$("div.test").html(query);
// unsafe
const e1 = document.createElement('p');
e1.innerHTML = url;
// unsafe
location.href = qs;
// unsafe
arr = [];
arr.push(url);
document.write(arr.join(' '));
// ok
const sanitized = DOMPurify.sanitize(qs)
document.write(sanitized);
Execution Result
$ semgrep --config dom-based-xss.yaml dom-based-xss.js
Scanning 1 file.
Findings:
dom-based-xss.js
dom-based-xss
dom-xss
10┆ document.write("unsafe" + qs);
⋮┆----------------------------------------
11┆ document.writeln("unsafe" + hash);
⋮┆----------------------------------------
14┆ $("div.test").html(query);
⋮┆----------------------------------------
18┆ e1.innerHTML = url;
⋮┆----------------------------------------
21┆ location.href = qs;
⋮┆----------------------------------------
26┆ document.write(arr.join(' '));
Ran 1 rule on 1 file: 6 findings.
Extract javascript embedded in other languages
By default Semgrep does not scan javascript embedded in HTML.
Consider the following test code
dom-based-xss.html
<html>
<body>
<script>
const qs = window.location.search;
const hash = document.location.hash;
// ok
document.write("<p>ok</p>");
// unsafe
document.write("unsafe" + qs);
document.writeln("unsafe" + hash);
</script>
</body>
</html>
Let's run the rule we just created on this test code.
$ semgrep --config dom-based-xss.yaml dom-based-xss.html
Nothing to scan.
Ran 1 rule on 0 files: 0 findings.
It could not detect it.
If you want to detect another language embedded within one such language, you must use the extract
mode.
Extract from HTML
The rules for extracting javascript from HTML are as follows
extract-html-to-javascript.yaml
rules:
- id: extract-html-to-javascript
mode: extract
languages:
- html
pattern: <script>$...SCRIPT</script>
extract: $...SCRIPT
dest-language: javascript
Extract mode requires the following five settings.
- mode: extract
- languages
- pattern
- extract
- dest-language
This rule allows us to detect javascript in HTML.
$ semgrep --config dom-based-xss.yaml --config extract-html-to-javascript.yaml dom-based-xss.html
Scanning 1 file.
Findings:
dom-based-xss.html
dom-based-xss
dom-xss
11┆ document.write("unsafe" + qs);
⋮┆----------------------------------------
12┆ document.writeln("unsafe" + hash);
Ran 2 rules on 1 file: 2 findings.
- Note, the extract rule must be set "after" the normal rule; if the extract rule is set "before" it cannot be detected.
$ semgrep --config extract-html-to-javascript.yaml --config dom-based-xss.yaml dom-based-x
ss.html
Scanning 1 file.
Ran 2 rules on 1 file: 0 findings.
Extract from ERB
In addition, here is a rule to extract from ERBs used in Ruby on Rails.
extract-erb-to-javascript.yaml
rules:
- id: extract-erb-to-javascript
mode: extract
languages:
- generic
options:
generic_ellipsis_max_span: 500
pattern: ...<script>$...SCRIPT</script>
extract: $...SCRIPT
dest-language: javascript
paths:
include:
- "*.erb"
There are two points to note
Point 1:
Use generic
because ERB is not a supported langage, and targets files with the extension .erb
.
Point 2:
generic
omits the 11th line of extracted text by default. Therefore, if the body of a <script>
tag exceeds 10 lines, it will not be extracted correctly. Therefore, the option generic_ellipsis_max_span
is set to allow extraction of up to 100 lines. (Please adjust the value since it affects performance.)
Conclusion
Through the process of creating rules to detect DOM-Based XSS in Semgrep, the following features were introduced
- taint mode
- source
- sink
- propagator
- sanitizer
- extract mode
- pattern
- extract
- dest-language
- option
- generic_ellipsis_max_span
Use it as a reference when creating your own rules.
The rules and test code created for this tutorial have been placed on GitHub.
https://github.com/takutoy/my-semgrep-rules/tree/master/javascript/browser/security
Trial and Error Records (in Japanese)
https://zenn.dev/takutoy/scraps/6c0f9c20bf1d86
Posted on December 24, 2022
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.