Go Lexical elements: Rune literals pt 3
Jonathan Hall
Posted on July 9, 2023
Let’s continue our disection of rune
literals. If you missed the parts, check them out from Friday when we discussed Unicode, and yesterday when we discussed quoting single characters.
Today we’re looking at the various escape sequences supported by the rune
literal syntax.
Rune literals
Several backslash escapes allow arbitrary values to be encoded as ASCII text. There are four ways to represent the integer value as a numeric constant:
\x
followed by exactly two hexadecimal digits;\u
followed by exactly four hexadecimal digits;\U
followed by exactly eight hexadecimal digits, and a plain backslash\
followed by exactly three octal digits. In each case the value of the literal is the value represented by the digits in the corresponding base.Although these representations all result in an integer, they have different valid ranges. Octal escapes must represent a value between 0 and 255 inclusive. Hexadecimal escapes satisfy this condition by construction. The escapes
\u
and\U
represent Unicode code points so within them some values are illegal, in particular those above0x10FFFF
and surrogate halves.
Let’s take these one at a time.
-
A single octal byte —
\OOO
You'll probably never use this, so let's get it out of the way first. But it's allowed. You can specify a byte using octal notation. However, note that as described, this is limited to values 0-255 inclusive, which means you can create an invalid rune
representation this way:
var x = rune('\400') // # 400 octal == 256 decimal
Produces the following error:
octal escape value 256 > 255
-
One, two, or four hexidecimal bytes —
\xXX
,\uXXXX
,\UXXXXXXXX
This allows you to a single byte with two hexidecimal digits, (\xXX
), two bytes with four digits (\uXXXX
), or the full 4 bytes of a rune
with eight hexidecimal digits (\UXXXXXXXX
).
And finally, there are some special escape sequences supported for rune
literals:
After a backslash, certain single-character escapes represent special values:
\a U+0007 alert or bell \b U+0008 backspace \f U+000C form feed \n U+000A line feed or newline \r U+000D carriage return \t U+0009 horizontal tab \v U+000B vertical tab \\ U+005C backslash \' U+0027 single quote (valid escape only within rune literals) \" U+0022 double quote (valid escape only within string literals)
(That last one arguably doesn't belong here, as it's not valid in a rune
literal, but it's nice to know that it's explicitly excluded here.)
Let's round out today's email with the rest of the rune
literal section, which is just the boring EBNF syntax, and some examples, which we don't need to discuss in any detail.
An unrecognized character following a backslash in a rune literal is illegal.
rune_lit = "'" ( unicode_value | byte_value ) "'" . unicode_value = unicode_char | little_u_value | big_u_value | escaped_char . byte_value = octal_byte_value | hex_byte_value . octal_byte_value = `\` octal_digit octal_digit octal_digit . hex_byte_value = `\` "x" hex_digit hex_digit . little_u_value = `\` "u" hex_digit hex_digit hex_digit hex_digit . big_u_value = `\` "U" hex_digit hex_digit hex_digit hex_digit hex_digit hex_digit hex_digit hex_digit . escaped_char = `\` ( "a" | "b" | "f" | "n" | "r" | "t" | "v" | `\` | "'" | `"` ) .
'a' 'ä' '本' '\t' '\000' '\007' '\377' '\x07' '\xff' '\u12e4' '\U00101234' '\'' // rune literal containing single quote character 'aa' // illegal: too many characters '\k' // illegal: k is not recognized after a backslash '\xa' // illegal: too few hexadecimal digits '\0' // illegal: too few octal digits '\400' // illegal: octal value over 255 '\uDFFF' // illegal: surrogate half '\U00110000' // illegal: invalid Unicode code point
Posted on July 9, 2023
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.