Programs as proteins - a ribosome programmed in Joy
Danie Palm
Posted on June 20, 2020
Ribosomes are enzymes that are part protein and part RNA. They play a central role in the translation of messenger RNA to proteins according to the genetic code. And yet all the protein subunits of ribosomes are themselves the products of messenger RNA that was translated according to the genetic code by... a ribosome. Ultimately, the messenger RNA that codes for the protein subunits and the ribosomal RNA that directly contributes to the translation function of the ribosome are both the products of DNA transcription, so that the ribosome, like other enzymes, are fully represented in the genome. Without the ribosome there would be no interpreter of the genetic code and no point in having any genes. The ribosome is therefore the most suitable starting place for our Joy-based artificial life system.
We have previously touched on the Y combinator, a function that is able to produce a fixed point of other functions (or function compositions, such as entire Joy programs). In this post we will finally put the Y combinator and its anonymous recursion to good use when we loop through the catalytic steps of the ribosome.
Biology
Ribosomes are roughly half rRNA and half protein. The rRNA component is made up of 4 rRNA strands and the protein component comprises about 80 proteins. The exact composition differs between bacteria, archaea, and eukaryotes, but the overall function and structure is largely comparable: all ribosomes are made up of a small and a large subunit, both of which is a mix of rRNA and proteins.
The process of translation begins when the small ribosomal subunit binds to mRNA, either at the start (the 5'-end) or just prior to the start codon (AUG
). It then scans towards the 3'-end until it reaches the start codon, at which point the large ribosomal subunit is also recruited to form the full ribosome.
Once the ribosomal complex is fully initiated, an amino acid-charged tRNA that is complementary to the start codon of the mRNA strand binds to a position called site A. Right after binding, the ribosome moves one codon to the right, so that the amino acid-charged tRNA is now located at a position known as site P. The next codon of the mRNA is now located at site A and once again a complementary charged tRNA binds to it. The amino acid at site P is now transfered to the amino acid at site A, and then the ribosome once again moves one codon to the right. Site P is now bound to a tRNA molecule that is charged with a peptide chain of two amino acids. This process continues until a stop codon is reached and on each iteration the peptide chain at site P grows by one amino acid, building up a protein of which the amino acid sequence is dictated by the mRNA strand being translated.
It is somewhat interesting to note that the protein subunits of the ribosome are not actively involved at the translation sites, but rather seem to support the rRNA that catalyses the elongation reaction. This could perhaps be an indication that primitive ribosomes were entirely made up of rRNA.
Code
Let's dive right in and take a look at the high-level logic that we want from a ribosome in the form of an Elixir function. The function takes a polypeptide (protein) list as the first argument and an mRNA list as the second. The first invocation should be with an empty polypeptide and an mRNA list of any length. It calls some undefined functions of which the names are hopefully self-explanatory.
def ribosome_worker(polypeptide, mrna) do
if Enum.count(mrna) >= 3 do
if polypeptide == [] do
{codon, mrna} = Enum.split(mrna, 3)
if codon == [:a, :u, :g] do
# Initiate translation
amino_acid = translate(codon)
polypeptide = polypeptide ++ amino_acid
ribosome_worker(polypeptide, mrna)
else
# Continue searching for start codon (AUG)
[_first_base | last_two_bases] = codon
mrna = last_two_bases ++ mrna
ribosome_worker(polypeptide, mrna)
end
else
# Elongate previously initiated polypeptide
{codon, mrna} = Enum.split(mrna, 3)
amino_acid = translate(codon)
if amino_acid == [] do
# the proper Elixir counterpart of what we're doing here
# detracts from the argument, so we move along instead
stash_current_polypeptide_somewhere_and_start_a_new_one()
else
polypeptide = polypeptide ++ amino_acid
end
ribosome_worker(polypeptide, mrna)
end
else
polypeptide
end
end
In short, we want to scan over a messenger RNA (mRNA) strand (a list of atoms :a
, :c
, :g
or :u
) for as long as it is longer than or equal to 3 bases. As soon as the sublist [:a, :u, :g]
is reached, protein synthesis is initiated. This initiation involves translating the AUG
codon to its corresponding amino acid (methionine in this case), which forms the first amino acid of the polypeptide (protein) chain. After initiation we recurse. This time however, the polypeptide is not empty and so we don't search for the initiating AUG
codon, but instead translate the next codon and attach the corresponding amino acid to the end of the growing polypeptide. Unless the codon corresponds to a stop codon, returning an empty amino acid, in which case the existing polypeptide chain is stashed (code for this is not shown) and a new chain is started. In either case, we once a again recurse, so that we either elongate the polypeptide or search for a new start codon. This process continues until the mRNA strand has less than 3 bases left, in which case it can contain no more codons so that the translation process is terminated.
The sequence as described above does not correspond exactly to the real process of translation of which there are many variations. For instance, when a ribosome encounters a stop codon, it detaches from the mRNA strand instead of continuing to search for the next occurence of AUG
.
The problem we now face, is to turn the above pseudocode into Joy. Some of the challenges we encounter in this process are:
- Joy, even without the restrictions we place on it, does not have the notion of variables
- the
ribosome_worker
function is recursive, but since it is not one of our primitives (likedup
,swap
, etc.) there is no direct way to make a recursive call in the Joy version - we are using a minimal subset of Joy in which we don't have numbers
- the pseudocode is in a mixture of Polish and infix notation, but Joy exclusively makes use of the reverse Polish notation
Joy doesn't have variables. Functions can only manipulate or act on values on the stack. Most Joy primitives only operate on the topmost one or two stack values. Before executing a function, it is therefore usually necessary to first manipulate the stack to make sure that the values expected by the function are located at the correct stack positions. Similarly, it may be necessary to do some stack cleanup after a function has been executed. In other words, it is sometimes necessary to dig out values from deep in the stack and place them on top, and other times it is necessary to bury values from the top of the stack deep underneath some others. Let's look at a family of such dig and bury functions.
We are already familiar with a function that will bury the top element (and simultaneously dig out the second element) - swap
. But it is possible to bury the top element n
positions deep with:
bury1 == [[] cons] dip swap i
bury2 == [[] cons cons] dip swap i
bury3 == [[] cons cons cons] dip swap i
in which cons
is repeated n
times.
For example:
[A] [B] [C] bury2 == [C] [A] [B]
Here is how it works:
[A] [B] [C] bury2
[A] [B] [C] [[] cons cons] dip swap i (definition)
[A] [B] [] cons cons [C] swap i (dip)
[A] [[B]] cons [C] swap i (cons)
[[A] [B]] [C] swap i (cons)
[C] [[A] [B]] i (swap)
[C] [A] [B] (i)
We cons
up two items into a single list. Then we swap
the list with the top element and unpack the consed up list again. Similary, we can dig out n
elements with:
dig1 == [] cons dip
dig2 == [] cons cons dip
dig3 == [] cons cons cons dip
Note that dig1
and bury1
are both equivalent to swap
. See The Theory of Concatenative Combinators for a more detailed treatment. We will not be using dign
or buryn
directly because we generally want to employ as few as possible amino acids. Instead, we will be writing out the bury
and dig
family of functions in full whenever they are needed.
Let's put this to the test. Suppose we have a function that takes 3 parameters: a number n
and two strings even_string
and odd_string
. The function must print the value of even_string
if n
is even and the value of odd_string
otherwise. In Elixir, it could look like this:
def odd_or_even(n, even_string, odd_string) do
if rem(n, 2) == 0 do
IO.puts(even_string)
else
IO.puts(odd_string)
end
end
A corresponding Joy function, allowing the use of numbers and strings, could look like this:
[odd_or_even]
[
[
# check if the remainder of n divided by 2 is 0
]
[
# print odd string
]
[
# print even string
]
ifte
]
define
Remember that ifte
is the Joy counterpart of an if-then-else block. Now, the remainder check could look like:
2 rem 0 equal
This assumes that the value representing n
is at the top of the stack. However, if the function is to be called like 4 "I am odd" "I am even" odd_or_even
, then n
is two places below the top and needs to be dug up first.
Similarly, for the case in which we want to print the value of even_string
, we can't just write puts
(the Joy built-in that writes to the standard output), because that would expect even_string
to be at the top of the stack. Instead we have to dig once (or swap) to get it. Lastly, in order to print the value of odd_string
, we can simply use puts
because odd_string
is indeed at the top. Here is the full Joy version:
[odd_or_even]
[
[
dig2 # dig up n
2 rem 0 equal # check if n is even
]
[
swap puts # print even_string if n is even
]
[
puts # print odd_string otherwise
]
ifte
]
define
Now that we know how to manipulate the stack, it is time to tackle recursion. Consider the following recursive function (assuming only non-negative values of n
):
def sum(total, n) do
if n == 0 do
IO.puts(total)
else
sum(total + n, n - 1)
end
end
Calling sum(0, 3)
should print out the number 6. Here is the corresponding Joy version, with all the necessary digging and burying:
[sum]
[
[0 equal] # n is at the top, so no digging
[
swap # dig out total, which is below n
puts # ... and print it
]
[
dup # duplicate n, which is at the top
dig2 # dig out total which is now buried under two n's
+ # sum total and the n below it
swap # so that the new total is under the n that remains from the earlier dup
1 - # subtract 1 from the n on top
sum # recurse
]
ifte
]
define
The else-block that makes the recursive call is particularly involved, because it needs to first manipulate the stack so that it is ready for the first calculation total = total + n
and then to prepare for the secod calculation n = n - 1
. And finally it has to place total
and n
on the stack in the order that is expected by the sum
function in preparation for the recursive call.
While the function above works, we are interested in a version of it that uses anonymous recursion. That is, instead of directly calling sum
recursively, we need to execute something that is equivalent but unnamed. In a previous post, we have established the following effect of the Y combinator:
[p] y == [q] p
where [q]
(short for [[dup cons p] dup cons p]
), when executed, again yields [q] p
. This means that if the program p
decides to execute q
by for instance invoking i
when [q]
is on top of the stack, then the result is that [q]
is once again placed on top of the stack and p
is once again executed, having again the option to unquote [q]
whenever its logic requires it to do so. In this way p
can effectively call itself anonymously.
We can now modify the definition of sum
to make use of the Y combinator as follows:
[sum]
[
[
[dig1 0 equal] # [q] is now above n, so we have to dig n out
[
dig2 # dig out total, which is below n and [q]
puts # ... and print it
]
[
dig1 dup # dig out and duplicate n
dig3 # dig out total which is now buried under two n's and [q]
+ # sum total and the n below it
bury2 # bury the new total under [q]
1 - # subtract 1 from the n that is now on top
bury1 # bury n under [q]
i # recurse: execute [q] which is on top
]
ifte
] y
]
define
We have now directly or indirectly addressed all the challenges we faced in translating pseudcode to restricted Joy. Perhaps a final note on the absence of numbers is warranted. While we won't support numbers (there are too many of them and we aim to use as few symbols as possible), there is no limitation on performing certain operations a number of times. For instance, the predicate Enum.count(mrna) >= 3
, can be calculated without using numbers:
not Enum.empty?(mrna) and not Enum.empty?(tl(mrna)) and not Enum.empty?(tl(tl(mrna)))
which is readily converted to Joy.
Without further ado, here is our ribosome, first with high-level functions like buryn
and dign
and itermediate definitions, and then in austere form using only primitives.
High-level ribosome:
[ribosome]
[
[] swap
[ribosome-worker]
y
]
define
[ribosome-worker]
[
# check if the mrna list contains at least 3 items
[swap empty] [ ] [[swap rest empty] [ ] [[swap rest rest empty] [ ] [
[dig2 empty] # if polypeptide is empty
[ # then initiate
swap split3 swap # dig up mrna and split at 3
[[a u g] equal] # if codon equals [a u g]
[ # then
translate # translate codon to amino acid
dig3 # dig out polypeptide
swap cat # dig out amino acid and concatenate to polypeptide
bury2 # bury polypeptide
swap # bury mrna below [q]
i # recurse
]
[ # else
rest # drop the first element of the codon
swap cat # prepend the rest to mrna
swap # bury mrna below [q]
i # recurse
]
ifte
]
[ # else elongate
swap split3 swap # dig up mrna and split at 3
translate # translate codon to amino acid
[empty] # if amino acid is empty
[ # then
bury2 # bury it as a new polypeptide
]
[ # else
dig3 # dig out polypeptide
swap cat # dig out amino acid and concatenate to polypeptide
bury2 # bury polypeptide
]
ifte
swap # bury mrna below [q]
i # recurse
]
ifte
] ifte] ifte] ifte
]
define
[split3]
[
[] swap
dup [first unit cat] dip rest
dup [first unit cat] dip rest
dup [first unit cat] dip rest
]
define
The split3
function is the counterpart of the Elixir function Enum.split(mrna, 3)
. Its first effect is to bury an empty list underneath the mRNA, which it expects to be on top of the stack. It then duplicates the mRNA and concatenates the head of the first copy to the list underneath it on the stack, and then it drops the head of the second copy. This duplication process is repeated 3 times to correspond to the 3 in split3
.
Austere, but verbose ribosome:
[ribosome]
[
[] swap
[
[swap empty] [ ] [[swap rest empty] [ ] [[swap rest rest empty] [ ] [
[[] cons cons dip empty]
[
swap
[] swap
dup [first unit cat] dip rest
dup [first unit cat] dip rest
dup [first unit cat] dip rest
swap
[[a u g] equal]
[
translate
[] cons cons cons dip
swap cat
[[] cons cons] dip swap i
swap
i
]
[
rest
swap cat
swap
i
]
ifte
]
[
swap
dup [first unit] dip rest
dup [first unit cat] dip rest
dup [first unit cat] dip rest
swap
translate
[empty]
[
[[] cons cons] dip swap i
]
[
[] cons cons cons dip
swap cat
[[] cons cons] dip swap i
]
ifte
swap
i
]
ifte
] ifte] ifte] ifte
]
[dup cons] swap cat dup cons i pop pop
]
define
We've sneaked in a few things not present in the initial Elixir code, but only so that we don't have to provide the initial empty polypeptide to the function. I.e. we can call ribosome
with just an mRNA sequence:
[c a u g a a c a u u g a a a u g u g a a a a a u g a a a c] ribosome
The output of the above program are two lists of amino acids (a stop codon in the mix triggered the start of a new polypeptide):
[met asn ile glu met] [met lys]
So the Joy ribosome does what we set out to do at the beginning of this post, but it is not quite what we ultimately want and I've slipped in some cheating. Here are some things that still need to be addressed:
The ribosome translates mRNA to amino acids according to the real genetic code, whereas we really want it to translate mRNA to primitive Joy functions or combinators (artificial amino acids) according to an artificial genetic code.
The ribosome can output multiple polypeptides, but these peptides are hopelessly flat - they contain no quotations. In essence the produced lists correspond to unfolded (primary structure) proteins, but Joy programs without quotations would hardly be capable of any interesting behaviour.
I've treated the
translate
function as a primitive of Joy. This explains why I haven't provided its definition above. I wasn't able to provide it, because it is only defined at a language level, i.e. in the underlying Elixir implementation of Joy. The end result is that I've hardcoded the genetic code, injecting it from the outside, instead of representing it within the system. For completeness, here is the definition oftranslate
:
def translate(stack) do
[head | rest] = stack
amino_acid =
case head do
[:u, :u, :u] -> [:phe]
[:u, :c, :u] -> [:ser]
[:u, :a, :u] -> [:tyr]
[:u, :g, :u] -> [:cys]
[:u, :u, :c] -> [:phe]
[:u, :c, :c] -> [:ser]
[:u, :a, :c] -> [:tyr]
[:u, :g, :c] -> [:cys]
[:u, :u, :a] -> [:leu]
[:u, :c, :a] -> [:ser]
[:u, :a, :a] -> [] # stop codon
[:u, :g, :a] -> [] # stop codon
[:u, :u, :g] -> [:leu]
[:u, :c, :g] -> [:ser]
[:u, :a, :g] -> [] # stop codon
[:u, :g, :g] -> [:trp]
[:c, :u, :u] -> [:leu]
[:c, :c, :u] -> [:pro]
[:c, :a, :u] -> [:his]
[:c, :g, :u] -> [:arg]
[:c, :u, :c] -> [:leu]
[:c, :c, :c] -> [:pro]
[:c, :a, :c] -> [:his]
[:c, :g, :c] -> [:arg]
[:c, :u, :a] -> [:leu]
[:c, :c, :a] -> [:pro]
[:c, :a, :a] -> [:gln]
[:c, :g, :a] -> [:arg]
[:c, :u, :g] -> [:leu]
[:c, :c, :g] -> [:pro]
[:c, :a, :g] -> [:gln]
[:c, :g, :g] -> [:arg]
[:a, :u, :u] -> [:ile]
[:a, :c, :u] -> [:thr]
[:a, :a, :u] -> [:asn]
[:a, :g, :u] -> [:ser]
[:a, :u, :c] -> [:ile]
[:a, :c, :c] -> [:thr]
[:a, :a, :c] -> [:asn]
[:a, :g, :c] -> [:ser]
[:a, :u, :a] -> [:ile]
[:a, :c, :a] -> [:thr]
[:a, :a, :a] -> [:lys]
[:a, :g, :a] -> [:arg]
[:a, :u, :g] -> [:met]
[:a, :c, :g] -> [:thr]
[:a, :a, :g] -> [:lys]
[:a, :g, :g] -> [:arg]
[:g, :u, :u] -> [:val]
[:g, :c, :u] -> [:ala]
[:g, :a, :u] -> [:asp]
[:g, :g, :t] -> [:gly]
[:g, :u, :c] -> [:val]
[:g, :c, :c] -> [:ala]
[:g, :a, :c] -> [:asp]
[:g, :g, :c] -> [:gly]
[:g, :u, :a] -> [:val]
[:g, :c, :a] -> [:ala]
[:g, :a, :a] -> [:glu]
[:g, :g, :a] -> [:gly]
[:g, :u, :g] -> [:val]
[:g, :c, :g] -> [:ala]
[:g, :a, :g] -> [:glu]
[:g, :g, :g] -> [:gly]
end
[amino_acid | rest]
end
To end with, the Joy programs I've presented here, while very pleasing to me, are by no means readable; not to the untrained eye. I should remind the reader that firstly I'm not an expert in Joy and am probably not using the most idiomatic code and secondly that I'm purposefully using tediously repetitive code in order to limit the number of different functions that are used. Anonymous recursion is for instance built into the standard Joy libraries, which I'm not using. Using real Joy is probably much closer to real joy.
Be that as it may. We have now built a workable first iteration of the most important enzyme of our artificial life system. In the next few posts we will focus on ironing out the remaining three issues with this implementation. This will bring us to a critical milestone: having a ribosome that can produce itself given an appropriate mRNA template.
Posted on June 20, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.