Python | Hamming Problem

banji220

Banji

Posted on July 28, 2020

Python | Hamming Problem

██████╗░███╗░░██╗░█████╗░
██╔══██╗████╗░██║██╔══██╗
██║░░██║██╔██╗██║███████║
██║░░██║██║╚████║██╔══██║
██████╔╝██║░╚███║██║░░██║
╚═════╝░╚═╝░░╚══╝╚═╝░░╚═╝

Hey, everyone.
In this post I'm going to tell you about Hamming problem (Simple) and my solution for it.
If you are not beginner better to leave this tutorial cuz it could be boring and useless for you!
but if you are a newbie bear with me cause it was such a cool problem for me.

problem:
Calculate the Hamming Distance between two DNA strands.

Your body is made up of cells that contain DNA. Those cells regularly wear out and need replacing, which they achieve by dividing into daughter cells. In fact, the average human body experiences about 10 quadrillion cell divisions in a lifetime!

When cells divide, their DNA replicates too. Sometimes during this process mistakes happen and single pieces of DNA get encoded with the incorrect information. If we compare two strands of DNA and count the differences between them we can see how many mistakes occurred. This is known as the "Hamming Distance".

We read DNA using the letters C,A,G and T. Two strands might look like this:

    GAGCCTACTAACGGGAT
    CATCGTAATGACGGCCT
    ^ ^ ^  ^ ^    ^^
Enter fullscreen mode Exit fullscreen mode

They have 7 differences, and therefore the Hamming Distance is 7.

The Hamming Distance is useful for lots of things in science, not just biology, so it's a nice phrase to be familiar with ❤

so first of all I defined a Function and used if statement to make sure if the length of two statement are equal or not so :

def distance(strand_a, strand_b):

    if len(strand_a) == len(strand_b):
        first_strand = [letter for letter in strand_a]
        second_strand = [letter for letter in strand_b]
    else:
        raise ValueError("The length of Sequences are not equal")
Enter fullscreen mode Exit fullscreen mode

but I could write this piece of code more simple, you may ask how?
like this:

def distance(strand_a, strand_b):
    if len(strand_a) != len(strand_b):
        raise ValueError("Length of two sequences most be the same")
Enter fullscreen mode Exit fullscreen mode

As you can see instead of writing 6-7 lines of code(first solution) I wrote second function in just 3 lines of code!

so let's see what we can do for the next part of the code...
we need to pair every iterator together with zip() function!
like this:


diff = zip(first_strand, second_strand)

Enter fullscreen mode Exit fullscreen mode

after that I created an empty list with two purpose:

  • put differences in a list
  • using len() function to get the length of differences

count = []

Enter fullscreen mode Exit fullscreen mode

With for loop we're looking in our tuples to see if paired iterators are same or not, and append the differences to an empty list which count = [] and using len(count) to get the length of differences from count and returning len(count)!

like this:

for x, y in diff:
        if x != y:
            count.append(x)  
return len(count)

Enter fullscreen mode Exit fullscreen mode

so the complete solution would be like this:

def distance(strand_a, strand_b):
    if len(strand_a) != len(strand_b):
        raise ValueError("Length of two sequences most be the same")

    count = []
    zip_a_b= zip(strand_a, strand_b)

    for x, y in zip_a_b:
        if x != y:
            count.append(x)
    return len(count)
Enter fullscreen mode Exit fullscreen mode

EDIT:
My friend Jeremy Grifski suggested a more efficient way with less code:

it feels weird to create and throw away a list just for its length, Jeremy Grifski said!
After all he comment his clever solution to improve our code, so here it is:
Instead of:

   count = []
    zip_a_b= zip(strand_a, strand_b)

    for x, y in zip_a_b:
        if x != y:
            count.append(x)
    return len(count)
Enter fullscreen mode Exit fullscreen mode

we are using Generator-expressions:

count = sum(1 for x, y in zip(strand_a, strand_b) if x != y)
return count
Enter fullscreen mode Exit fullscreen mode

If you want to know more about List-comprehension or Generator-expressions, I found this Link useful to understand these two concepts.

Finally if you think I can write cleaner and more readable just let me know and leave a comment below.
Tnx for reading my post.
and spending you time with me.

Keep Moving Forward ツ

Code with 💛

🅑🅐🅝🅙🅘

💖 💪 🙅 🚩
banji220
Banji

Posted on July 28, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

Python | Hamming Problem
python Python | Hamming Problem

July 28, 2020