Semgrep Writing Rule Tutorial (DOM-Based XSS)

Semgrep provides a large number of rules, but sometimes you may want to customize a rule or create a new one.

For example, when a vulnerability is found in a product developed by the organization, we want to

check if similar vulnerabilities exist in other products
detect similar vulnerabilities in the future

In such cases, rules for finding vulnerabilities will help maintain the security of the product.

This article will guide you through the process of creating a rule to detect DOM-Based XSS and help you understand the features and options required to create a rule.

Prerequisite

This tutorial uses semgrep 1.2.1.

$ semgrep --version
1.2.1

Search for similar rules

Semgrep Registry provides rules created by r2c, developer of Semgrep, and the community.

There may already be a rule that has been created that you are trying to create, or there may be similar rules, so search first.

In this tutorial, we want to create a rule to detect DOM-Based XSS, so we will search javascript rule for dom xss.

The search found the rule javascript.browser.security.dom-based-xss.dom-based-xss, so we will create a rule based on this.

part of javascript.browser.security.dom-based-xss.dom-based-xss:

pattern-either:
  - pattern: document.write(<... document.location.$W ...>)
  - pattern: document.write(<... location.$W ...>)

This rule seems to only detect document.write().

Create test cases

Before you start writing rules, you should first create test cases.

test code you want to detect (unsafe)
test code you do not want to detect (safe)

The test case is as follows:

dom-based-xss.js

const qs = window.location.search;
const hash = window.location.hash;

// ok
document.write("<p>ok</p>");

// unsafe
document.write(qs);
document.write(hash);

Test cases do not need to cover all patterns from the beginning. You can add test cases as you create rules.

By the way, it still does not detect correctly when executed with the current rules.

$ semgrep --config dom-based-xss.yaml dom-based-xss.js
Scanning 1 file.

Ran 1 rule on 1 file: 0 findings.

Taint tracking

Injection attacks such as XSS are characterized by source and sink. The place where the attack code is placed is called the source, and the place where the attack code is executed is called the sink.

Semgrep has a feature called taint tracking that analyzes whether an untrusted source reaches a vulnerable sink.

Taint tracking may reduce false negatives and false positives.

Taint mode

To use taint tacking, set mode to taint and write pattan-sources and pattern-sinks.

dom-based-xss.yaml

rules:
- id: dom-based-xss
  mode: taint
  message: dom-xss
  languages:
  - javascript
  - typescript
  severity: ERROR
  pattern-sources:
  - pattern: window.location
  pattern-sinks:
  - pattern-either:
    - pattern: document.write(...)

This rule has the following settings:

mode: taint
pattan-sources to window.location, the source of DOM-XSS.
pattan-sinks to document.write(...), the sink of DOM-XSS

In this rule, The taint tracking analyzes the following

const qs = window.location.search;
- window.location is tainted
- window.location.search is tainted too
- constant qs is also tainted
document.write(qs);
- tainted qs is used in vulnerable sink

Run this rule on the previous test case and you will see that it is now detectable.

$ semgrep --config dom-based-xss.yaml dom-based-xss.js 
Scanning 1 file.

Findings:

  dom-based-xss2.js 
     dom-based-xss  
        dom-xss     

          5┆ document.write(qs);
          ⋮┆----------------------------------------
          6┆ document.write(hash);


Ran 1 rule on 1 file: 2 findings

Enhance Source and Sink

DOM-XSS source can be other than window.location and sink can be other than document.write().

For example,
Introducing DOM Invader: DOM XSS just got a whole lot easier to find : PortSwigger presented 11 sources and 86 sinks.

Since the article would be too long if we tried to cover all sources and sinks, 5 sources and sinks are selected.

sources

- location
- location.href
- location.hash
- location.search
- document.URL

sinks

- document.write()
- document.writeln()
- jQuery.html()
- element.innerHTML
- location.href

These sources and sinks are written in the rules as follows.

dom-based-xss.yaml

rules:
- id: dom-based-xss
  mode: taint
  message: dom-xss
  languages:
  - javascript
  - typescript
  severity: ERROR
  pattern-sources:
  - pattern-either:
    - pattern: location
    - pattern: window.location
    - pattern: document.location
    - pattern: document.URL
  pattern-sinks:
  - pattern-either:
    - pattern: document.write($PAYLOAD)
    - pattern: document.writeln($PAYLOAD)
    - pattern: $JQ.html($PAYLOAD)
    - pattern: $ELEMENT.innerHTML = $PAYLOAD
    - pattern: location.href = $PAYLOAD

Notes.

Once location is set, location.href location.hash location.search is also automatically set to source.
The location is added because window.location and document.location are also available.

Add test cases to match the addition of the sources and sinks.

dom-based-xss.js

const qs = window.location.search;
const hash = document.location.hash;
const query = location.search;
const url = document.URL;

// ok
document.write("<p>ok</p>");

// unsafe
document.write("unsafe" + qs);
document.writeln("unsafe" + hash);

// unsafe
$("div.test").html(query)

// unsafe
const e1 = document.createElement('p');
e1.innerHTML = url;

// unsafe
location.href = qs

After adding the test cases, let's run the rule; there are 5 unsafe cases, so 5 should be detected.

$ semgrep --config dom-based-xss.yaml dom-based-xss.js
Scanning 1 file.

Findings:

  dom-based-xss.js
     dom-based-xss
        dom-xss

         10┆ document.write("unsafe" + qs);
          ⋮┆----------------------------------------
         11┆ document.writeln("unsafe" + hash);
          ⋮┆----------------------------------------
         14┆ $("div.test").html(query)
          ⋮┆----------------------------------------
         18┆ e1.innerHTML = url;
          ⋮┆----------------------------------------
         21┆ location.href = qs


Ran 1 rule on 1 file: 5 findings.

Properly detected!

Propagator

In taint tracking, tracking may be interrupted when some functions are used.

For example, the following cases will result in DOM-XSS, but will not be detected by Semgrep.

// unsafe
arr = [];
arr.push(url);
document.write(arr.join(' '));

This is because Semgrep does not know that arr is tainted by push(url). This is where propagators come in.

The propagators are set as follows

pattern-propagators:
- pattern: $ARR.push($E)
  from: $E
  to: $ARR

This will also detect the previous test case. In addition to push, shift and unshift need to be set as propagators well.

Sanitizer

If a variable is properly sanitized, DOM-XSS will not occur.

For example, if you sanitize using DOMPurify, DOM-XSS will not occur. But the current rules will detect it.

// ok
const sanitized = DOMPurify.sanitize(qs)
document.write(sanitized);

So, setting sanitizers will break the tracking assuming the variable is sanitized.

pattern-sanitizers:
- pattern: DOMPurify.sanitize(...)

This will prevent the previous test case from being detected.

Summary of taint tracking

We have now created a rule to detect DOM-Based XSS using taint mode.

For taint mode, we used the following settings

mode: taint
pattern-sources
pattern-sinks
pattern-propagators
pattern-sanitizers

The completed YAML file and test code are as follows

dom-based-xss.yaml

rules:
- id: dom-based-xss
  mode: taint
  message: dom-xss
  languages:
  - javascript
  - typescript
  severity: ERROR
  pattern-sources:
  - pattern-either:
    - pattern: location
    - pattern: window.location
    - pattern: document.location
    - pattern: document.URL
  pattern-sinks:
  - pattern-either:
    - pattern: document.write($PAYLOAD)
    - pattern: document.writeln($PAYLOAD)
    - pattern: $JQ.html($PAYLOAD)
    - pattern: $ELEMENT.innerHTML = $PAYLOAD
    - pattern: location.href = $PAYLOAD
  pattern-propagators:
  - pattern: $ARR.push($E)
    from: $E
    to: $ARR
  pattern-sanitizers:
  - pattern: DOMPurify.sanitize(...)

dom-based-xss.js

const qs = window.location.search;
const hash = document.location.hash;
const query = location.search;
const url = document.URL;

// ok
document.write("<p>ok</p>");

// unsafe
document.write("unsafe" + qs);
document.writeln("unsafe" + hash);

// unsafe
$("div.test").html(query);

// unsafe
const e1 = document.createElement('p');
e1.innerHTML = url;

// unsafe
location.href = qs;

// unsafe
arr = [];
arr.push(url);
document.write(arr.join(' '));

// ok
const sanitized = DOMPurify.sanitize(qs)
document.write(sanitized);

Execution Result

$ semgrep --config dom-based-xss.yaml dom-based-xss.js
Scanning 1 file.

Findings:

  dom-based-xss.js
     dom-based-xss
        dom-xss

         10┆ document.write("unsafe" + qs);
          ⋮┆----------------------------------------
         11┆ document.writeln("unsafe" + hash);
          ⋮┆----------------------------------------
         14┆ $("div.test").html(query);
          ⋮┆----------------------------------------
         18┆ e1.innerHTML = url;
          ⋮┆----------------------------------------
         21┆ location.href = qs;
          ⋮┆----------------------------------------
         26┆ document.write(arr.join(' '));


Ran 1 rule on 1 file: 6 findings.

Extract javascript embedded in other languages

By default Semgrep does not scan javascript embedded in HTML.

Consider the following test code

dom-based-xss.html

<html>
    <body>
        <script>
const qs = window.location.search;
const hash = document.location.hash;

// ok
document.write("<p>ok</p>");

// unsafe
document.write("unsafe" + qs);
document.writeln("unsafe" + hash);
        </script>
    </body>
</html>

Let's run the rule we just created on this test code.

$ semgrep --config dom-based-xss.yaml dom-based-xss.html
Nothing to scan.

Ran 1 rule on 0 files: 0 findings.

It could not detect it.

If you want to detect another language embedded within one such language, you must use the extract mode.

Extract from HTML

The rules for extracting javascript from HTML are as follows

extract-html-to-javascript.yaml

rules:
- id: extract-html-to-javascript
  mode: extract
  languages:
    - html
  pattern: <script>$...SCRIPT</script>
  extract: $...SCRIPT
  dest-language: javascript

Extract mode requires the following five settings.

mode: extract
languages
pattern
extract
dest-language

This rule allows us to detect javascript in HTML.

$ semgrep --config dom-based-xss.yaml --config extract-html-to-javascript.yaml dom-based-xss.html
Scanning 1 file.

Findings:

  dom-based-xss.html
     dom-based-xss
        dom-xss

         11┆ document.write("unsafe" + qs);
          ⋮┆----------------------------------------
         12┆ document.writeln("unsafe" + hash);

Ran 2 rules on 1 file: 2 findings.

Note, the extract rule must be set "after" the normal rule; if the extract rule is set "before" it cannot be detected.

$ semgrep --config extract-html-to-javascript.yaml --config dom-based-xss.yaml dom-based-x
ss.html
Scanning 1 file.

Ran 2 rules on 1 file: 0 findings.

Extract from ERB

In addition, here is a rule to extract from ERBs used in Ruby on Rails.

extract-erb-to-javascript.yaml

rules:
- id: extract-erb-to-javascript
  mode: extract
  languages:
    - generic
  options:
    generic_ellipsis_max_span: 500
  pattern: ...<script>$...SCRIPT</script>
  extract: $...SCRIPT
  dest-language: javascript
  paths:
    include:
      - "*.erb"

There are two points to note

Point 1:
Use generic because ERB is not a supported langage, and targets files with the extension .erb.

Point 2：
generic omits the 11th line of extracted text by default. Therefore, if the body of a <script> tag exceeds 10 lines, it will not be extracted correctly. Therefore, the option generic_ellipsis_max_span is set to allow extraction of up to 100 lines. (Please adjust the value since it affects performance.)

Conclusion

Through the process of creating rules to detect DOM-Based XSS in Semgrep, the following features were introduced

taint mode
- source
- sink
- propagator
- sanitizer
extract mode
- pattern
- extract
- dest-language
option
- generic_ellipsis_max_span

Use it as a reference when creating your own rules.

The rules and test code created for this tutorial have been placed on GitHub.
https://github.com/takutoy/my-semgrep-rules/tree/master/javascript/browser/security

Trial and Error Records (in Japanese)
https://zenn.dev/takutoy/scraps/6c0f9c20bf1d86

Blog

Semgrep Writing Rule Tutorial (DOM-Based XSS)

takutoy

Prerequisite

Search for similar rules

Create test cases

Taint tracking

Taint mode

Enhance Source and Sink

Propagator

Sanitizer

Summary of taint tracking

Extract javascript embedded in other languages

Extract from HTML

Extract from ERB

Conclusion

Join Our Newsletter. No Spam, Only the good stuff.

Related