Phred quality score
Roberto Preste
Posted on July 7, 2019
Next Generation Sequencing techniques have brought new insights into -omics data analysis, mostly thanks to their reliability in detecting biological variants. This reliability is usually measured using a value called Phred quality score (or Q score).
The Phred score of a base is an integer value that represents the estimated probability of an error in base calling. Mathematically, a Q score is logarithmically related to the base-calling error probabilities P, and can be calculated using the following formula:
Q = -10 log10 P
In the real world, a quality score of 20 means that there is a possibility in 100 that the base in incorrect; a quality score of 40 means the chances that the base is called incorrectly is 1 in 10000.
The Phred score is also inversely related to the base call accuracy, thus a higher Q score means a more reliable base call. Here is a useful table which shows this simple relationship:
Phred Quality Score | Incorrect base call prob | Base call accuracy |
---|---|---|
10 | 1 in 10 | 90% |
20 | 1 in 100 | 99% |
30 | 1 in 1000 | 99.9% |
40 | 1 in 10000 | 99.99% |
In fastq files, Phred quality scores are usually represented using ASCII characters, such that the quality score of each base can be specified using a single character. While older Illumina data used to apply the ASCII_BASE 64, nowadays the ASCII_BASE 33 table has been universally adopted for NGS data:
Q Score | ASCII char | Q Score | ASCII char | Q Score | ASCII char | Q Score | ASCII char |
---|---|---|---|---|---|---|---|
0 | ! | 11 | , | 22 | 7 | 32 | A |
1 | " | 12 | - | 23 | 8 | 33 | B |
2 | # | 13 | . | 24 | 9 | 34 | C |
3 | $ | 14 | / | 25 | : | 35 | D |
4 | % | 15 | 0 | 26 | ; | 36 | E |
5 | & | 16 | 1 | 27 | < | 37 | F |
6 | ' | 17 | 2 | 28 | = | 38 | G |
7 | ( | 18 | 3 | 29 | > | 39 | H |
8 | ) | 19 | 4 | 30 | ? | 40 | I |
9 | * | 20 | 5 | 31 | @ | 41 | J |
10 | + | 21 | 6 |
Even though there are lots of Python, Biopython and stand-alone softwares for dealing with Phred quality scores, a simple command to convert an ASCII character to its correspondent quality score is the following (from the terminal):
python3 -c 'print(ord("<ASCII>")-33)'
Or, when working in a Python3 session:
print(ord("<ASCII>")-33)
In both cases, just replace <ASCII>
with the actual ASCII character and that will do the trick.
Posted on July 7, 2019
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.