“Termial” random for prioritized picking an item from a list

fridex

Fridolín Pokorný

Posted on February 20, 2020

“Termial” random for prioritized picking an item from a list

Let’s take a look at a solution that randomly picks an item from a list. Instead of assigning equal probability to each item in the list, let’s create an algorithm that assigns the highest probability to the item at index 0 and lower probabilities to all remaining items as the index in the list grows. Can we implement such a function?😓

You can use already available routines present in the Python standard library to pick a random item from a list. Let’s say we want to randomly pick a number from a list of integers. The only thing you need to do is to run the following snippet:

import random
my_list = [42, 33, 30, int("deadbeef", base=16)]
random.choice(my_list)
# results in 42 with a probability of 1 / len(my_list)
Enter fullscreen mode Exit fullscreen mode

Now let’s say we want to prioritize number 42 over 33, prioritize number 33 over 30, and at the same time, prioritize number 30 over “0xdeadbeef”. We have 4 numbers in total in our list, let’s assign weights to these numbers in the following manner:

+--------------+-------------+
|    number    |    weight   |
+--------------+-------------+
|     42       |      4      |
|     33       |      3      |
|     30       |      2      |
|  0xdeadbeef  |      1      |
+--------------+-------------+
Enter fullscreen mode Exit fullscreen mode

The higher the weight is, the higher the probability we pick the given number.
You can see weighs as a number of “buckets” we assign to each number from the list. Subsequently, we randomly (random uniform) try to hit one bucket. After hitting the bucket, we check to which number this bucket corresponds to.
The total number of buckets we can hit is equal to the sum of weights:

4 + 3 + 2 + 1 = 10
Enter fullscreen mode Exit fullscreen mode

The probability of hitting a bucket based on the number from the list is shown below:

+--------------+-----------------------+
|    number    |       probability     |
+--------------+-----------------------+
|     42       |       4/10 = 0.4      |
|     33       |       3/10 = 0.3      |
|     30       |       2/10 = 0.2      |
|  0xdeadbeef  |       1/10 = 0.1      |
+--------------+-----------------------+
Enter fullscreen mode Exit fullscreen mode

To generalize this for n numbers, we can come up with the following formula:

formula1

In other mathematical words:

formula2

Does this formula look familiar to you? It’s called termial of a positive integer n; from Wikipedia:

The termial was coined by Donald E. Knuth in his The Art of Computer Programming. It is the additive analog of the factorial function, which is the product of integers from 1 to n. He used it to illustrate the extension of the domain from positive integers to the real numbers.

Now, let’s make our hands dirty with some code. To compute the termial of n:

termial_of_n = sum(range(1, len(my_list) + 1))  # O(N)
Enter fullscreen mode Exit fullscreen mode

Another, more effective way, to compute the termial of n is to use Binomial coefficient principle and compute (len(my_list) + 1) over 2:

l = len(my_list)
# (l + 1) over 2 = l! / (2!*(l-2)!) = l * (l - 1) / 2
termial_of_n = ((l**2) + l) >> 1  # O(1)
Enter fullscreen mode Exit fullscreen mode

Finally, we can pick a random (random uniform) bucket from our buckets:


import random
choice = random.randrange(termial_of_n)
Enter fullscreen mode Exit fullscreen mode

The result stored in the variable choice holds an integer starting from 0 to 9 (inclusively) and represents an index to the list of the buckets we created earlier:

+--------------+---------------+---------------+
|    choice    |     bucket    |     number    |
+--------------+---------------+---------------+
|      0       |       1       |       42      |
+--------------+---------------+---------------+
|      1       |       2       |       42      |
+--------------+---------------+---------------+
|      2       |       3       |       42      |
+--------------+---------------+---------------+
|      3       |       4       |       42      |
+--------------+---------------+---------------+
|      4       |       5       |       33      |
+--------------+---------------+---------------+
|      5       |       6       |       33      |
+--------------+---------------+---------------+
|      6       |       7       |       33      |
+--------------+---------------+---------------+
|      7       |       8       |       30      |
+--------------+---------------+---------------+
|      8       |       9       |       30      |
+--------------+---------------+---------------+
|      9       |       10      |   0xdeadbeef  |
+--------------+---------------+---------------+
Enter fullscreen mode Exit fullscreen mode

How do we find which number we hit through a randomly picked bucket for any n?

Let’s revisit how we computed the termial number of n using the Binomial coefficient based formula:

l = len(my_list)
termial_of_n = ((l**2) + l) >> 1
Enter fullscreen mode Exit fullscreen mode

Based on termial function definition, we know that regardless of n, we always assign 1 bucket to number at index 0, 2 buckets to number at index 1, 3 buckets to number at index 2 and so on. Using this knowledge, we can transform the Binomial coefficient formula to the following equation:

choice = ((i**2) + i) >> 1
Enter fullscreen mode Exit fullscreen mode

The next step is to find i that satisfies the given equation. The equation is a quadratic function described as:

a*i**2 + b*i + c = 0

Enter fullscreen mode Exit fullscreen mode

where:

a = 1/2
b = 1/2
c = -choice
Enter fullscreen mode Exit fullscreen mode

As the choice is expected to be always a non-negative integer (index to the list of buckets), we can search for a solution that always results in a non-negative integer (reducing one discriminant term that always results in negative i).

import math
# D = b**2 - 4*a*c
# x1 = (-b + math.sqrt(D)) / (2*a)
# x2 = (-b - math.sqrt(D)) / (2*a)
# Given:
#   a = 1/2
#   b = 1/2
#   c = -choice
# D = (1/2)**2 + 4*0.5*choice = 0.25 + 2*choice
i = math.floor(-0.5 + math.sqrt(0.25 + (choice << 1)))
Enter fullscreen mode Exit fullscreen mode

The solution has to be rounded using math.floor as it corresponds to the inverted index with respect to n. As i is inverted, the final solution (index to the original list) is:

my_list[n - 1 - i]
Enter fullscreen mode Exit fullscreen mode

Asymptotic complexity analysis

Let’s assume:

  • the len function can return the length of the list in O(1) time
  • random.randrange* operates in O(1) time
  • we use Binomial coefficient based equation for computing the termial of n

The whole computation is done in O(1) time with O(1) space.

If we would use the sum based computation of the termial of n, the algorithm would become O(n) time with O(1) space.


Disclaimer & original usage

I designed this algorithm for Thoth’s recommendation engine. Its main purpose is to prefer more recent versions of packages in the resolver during the resolution of Python software stacks.

I would be happy for any feedback or any similar approaches recommended.

A complete solution can be found on my GitHub gist:

💖 💪 🙅 🚩
fridex
Fridolín Pokorný

Posted on February 20, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related