Phred quality score

robertopreste

Roberto Preste

Posted on July 7, 2019

Phred quality score

Next Generation Sequencing techniques have brought new insights into -omics data analysis, mostly thanks to their reliability in detecting biological variants. This reliability is usually measured using a value called Phred quality score (or Q score).

The Phred score of a base is an integer value that represents the estimated probability of an error in base calling. Mathematically, a Q score is logarithmically related to the base-calling error probabilities P, and can be calculated using the following formula:

Q = -10 log10 P

In the real world, a quality score of 20 means that there is a possibility in 100 that the base in incorrect; a quality score of 40 means the chances that the base is called incorrectly is 1 in 10000.

The Phred score is also inversely related to the base call accuracy, thus a higher Q score means a more reliable base call. Here is a useful table which shows this simple relationship:

Phred Quality Score Incorrect base call prob Base call accuracy
10 1 in 10 90%
20 1 in 100 99%
30 1 in 1000 99.9%
40 1 in 10000 99.99%

In fastq files, Phred quality scores are usually represented using ASCII characters, such that the quality score of each base can be specified using a single character. While older Illumina data used to apply the ASCII_BASE 64, nowadays the ASCII_BASE 33 table has been universally adopted for NGS data:

Q Score ASCII char Q Score ASCII char Q Score ASCII char Q Score ASCII char
0 ! 11 , 22 7 32 A
1 " 12 - 23 8 33 B
2 # 13 . 24 9 34 C
3 $ 14 / 25 : 35 D
4 % 15 0 26 ; 36 E
5 & 16 1 27 < 37 F
6 ' 17 2 28 = 38 G
7 ( 18 3 29 > 39 H
8 ) 19 4 30 ? 40 I
9 * 20 5 31 @ 41 J
10 + 21 6

Even though there are lots of Python, Biopython and stand-alone softwares for dealing with Phred quality scores, a simple command to convert an ASCII character to its correspondent quality score is the following (from the terminal):

python3 -c 'print(ord("<ASCII>")-33)'
Enter fullscreen mode Exit fullscreen mode

Or, when working in a Python3 session:

print(ord("<ASCII>")-33)
Enter fullscreen mode Exit fullscreen mode

In both cases, just replace <ASCII> with the actual ASCII character and that will do the trick.

💖 💪 🙅 🚩
robertopreste
Roberto Preste

Posted on July 7, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Phred quality score
python Phred quality score

July 7, 2019