Java Daily Coding Problem #006

Daily Coding Problem is a website which will send you a programming challenge to your inbox every day. I want to show beginners how to solve some of these problems using Java, so this will be an ongoing series of my solutions. Feel free to pick them apart in the comments!

Problem

An XOR linked list is a more memory efficient doubly linked list. Instead of each node holding next and prev fields, it holds a field named both, which is an XOR of the next node and the previous node. Implement an XOR linked list; it has an add(element) which adds the element to the end, and a get(index) which returns the node at index.

If using a language that has no pointers (such as Python), you can assume you have access to get_pointer and dereference_pointer functions that converts between nodes and memory addresses.

Strategy

Spoilers! Don't look below unless you want to see my solution!

Doubly-Linked Lists

So, first, I need to understand what a doubly-linked list is. So I read the Wikipedia article and the way I understand it is this:

Each element of a singly-linked list (or just a linked list) contains:

some data
a pointer or reference to the next element in the list

Each element of a doubly-linked list contains:

some data
a pointer or reference to the next element in the list
a pointer or reference to the previous element in the list

You can visualise the differences between these kinds of lists like so:

Singly-linked lists can only be traversed in the forward direction. An element of the list has no idea what element came before it (if any) and has no way of accessing it. A doubly-linked list can be traversed in either direction, forward of backward. When the user tries to reach beyond the end of the list (or beyond the beginning of the list for a doubly-linked list) some kind of error should be thrown, or null should be returned.

Next... what's an XOR list? The prompt sort of explains the concept, but I'm not sure I understand it. I'll come back to this in a bit.

Strict Interpretation of Prompt

I found this SO answer which explains the pros and cons of doubly-linked lists fairly well. It also gives a good explanation of why we might want to use one. If I may paraphrase...

In a "true" singly-linked or doubly-linked list, we hold two objects:

the data
a pointer or reference to another element of the list

A pointer is simply another variable which holds a memory address. As an example, let's say we have a doubly-linked list of C-language ints, which are 4 bytes in length. Pointers in C on a 64-bit machine will be 8 bytes long, so each element of our list requires 20 bytes, for the int and the two pointers.

Let's suppose the first element of the list is at the address 0x9A7D...

The prefix 0x indicates that this is a hexadecimal number. On a 64-bit machine (where 64-bit refers to the size of the memory address space), representing any address requires 8 bytes. A single byte (range 0-255) can be represented by two hexadecimal digits. So 8 bytes require 16 hexadecimal digits.

In our example, that address references the value stored by that element of the list. Since that value is an int, it will take up four bytes of space (eight hex digits). The next 8 bytes (sixteen hex digits) will be a pointer to the "next" element in the list, and the final 8 bytes will be a pointer to the "previous" element in the list, so:

D0 C0 FF EE [ "next" address      ] [ "previous" address  ]
^^^^^^^^^^^
4-byte integer value (hex "D0 C0 FF EE" = decimal "3 502 309 358")

If the first element of the list (this one) is at the address

0x9E7D6300BA679A7D

...then the next element will be offset by 20 bytes (assuming the elements of the list are stored in a contiguous block of memory, which may not be the case), at

0x9E7D6300BA679A91

0x7D + 20 = 0x7D + 0x14 = 0x91

D0 C0 FF EE 9E 7D 63 00 BA 67 9A 91 [ "previous" address  ]
            ^^^^^^^^^^^^^^^^^^^^^^^
             "next" memory address

Similarly, the "previous" address will be offset by 20 bytes in the opposite direction for all elements of the list except this first one. Since the first element has no "previous" address, there will be some indicator that there is no element there (probably a null pointer to memory address 0x0):

D0 C0 FF EE 9E 7D 63 00 BA 67 9A 91 00 00 00 00 00 00 00 00
                                    ^^^^^^^^^^^^^^^^^^^^^^^
                                   "previous" memory address

In C, we obviously won't work directly with these bytes. Instead, we'll have something like:

struct DLL {

  int data; // value
  struct DLL *next;
  struct DLL *previous;

}

...but how on earth could we implement this in Java? Java borrows a lot of syntax from C/C++, but it has no structs. So in Java, we might instead have:

class DLL {

  int value; 
  /* implement next */
  /* implement previous */

}

Java also has no pointers, but remember that the prompt states:

If using a language that has no pointers (such as Python), you can assume you have access to get_pointer and dereference_pointer functions that converts between nodes and memory addresses.

Java, designed to be a "safe" language, does not allow you to mess around with memory addresses. So the get_pointer and dereference_pointer methods are purely imaginary. From the prompt description, they should have type signatures like:

DLL* get_pointer (DLL link)

DLL dereference_pointer (DLL* pointer)

...which convert the DLL object into a pointer containing its address in memory and vice versa. The problem suggests doing something along the lines of:

public class DLL {

  public  int  data; // value
  private DLL  next;
  private DLL  previous;

  public  DLL* next()     { return get_pointer(next)     }
  public  DLL* previous() { return get_pointer(previous) }

}

This is ugly. It's also invalid Java syntax. It's useless in real life. I don't think it's worthwhile to continue this hypothetical solution.

Re-Interpretation of Prompt

Instead of following the prompt to the letter, let's reinterpret it a bit so we can actually capture the spirit of XOR linked lists in Java.

We could try to use java.lang.instrument to determine the sizes (in bytes) of objects in Java, but this approach yields unintuitive results (all Strings are the same size) which aren't useful for our purposes (knowing the actual size in memory of an object).

Instead, let's simulate memory addresses. Java allows for hexadecimal literals, and (by default) converts them to decimal when they're printed:

jshell> 0x10
$3 ==> 16

jshell> 0x20
$4 ==> 32

jshell> 0x10 + 0x10
$5 ==> 32

Since a Java int has a range of [-2e31, 2e31-1] ([-2147483648, 2147483647]), its maximal hex values are:


jshell> 0x0
$31 ==> 0

jshell> 0x7FFFFFFF
$32 ==> 2147483647

jshell> 0x80000000
$33 ==> -2147483648

jshell> 0xFFFFFFFF
$34 ==> -1

So we can create a method that "allocates" a memory address for a new object, given the size of the object. We can keep track of which "memory locations" are being used so we don't accidentally "overwrite" old data. Then, we can finally implement the XOR-linked list (which will be explained during its implementation) in Java.

Code

To start, let's create a simple Memory class in Java:

public class Memory {

  private static Random random = new Random();

  // return any address except "0x00000000", the "NULL" address
  public static String randomAddress() {
    int value = random.ints(1L, Integer.MIN_VALUE, Integer.MAX_VALUE).toArray()[0] + 1;
    return intToAddress(value);
  }

  public static int addressToInt(String address) {
    return (int)(Long.parseLong(address.substring(2), 16) + Integer.MIN_VALUE);
  }

  public static String intToAddress(int value) {
    long longValue = (long)value - (long)Integer.MIN_VALUE;
    return String.format("0x%8s", Long.toString(longValue, 16).toUpperCase()).replaceAll(" ", "0");
  }

}

This class has three functions: intToAddress(), which takes any int value as an argument and converts it to a hex address String, addressToInt(), which performs the inverse, and randomAddress(), which generates a random address String. The GitHub repository which hosts this code adds some bounds checking and comments.

jshell> Memory.randomAddress()
$195 ==> "0xB821DAE5"

jshell> Memory.addressToInt("0xB821DAE5")
$196 ==> 941742821

jshell> Memory.intToAddress(941742821)
$197 ==> "0xB821DAE5"

The maximal values for intToAddress() are Integer.MIN_VALUE and Integer.MAX_VALUE:

jshell> Memory.intToAddress(Integer.MIN_VALUE)
$198 ==> "0x00000000"

jshell> Memory.intToAddress(Integer.MAX_VALUE)
$199 ==> "0xFFFFFFFF"

...which return the maximal values for addressToInt():

jshell> Memory.addressToInt("0x00000000")
$200 ==> -2147483648

jshell> Integer.MIN_VALUE
$201 ==> -2147483648

jshell> Memory.addressToInt("0xFFFFFFFF")
$202 ==> 2147483647

jshell> Integer.MAX_VALUE
$203 ==> 2147483647

Note that my implementation of "maximum" and "minimum" hexadecimal values is different than Java's, which wrap at 0x7FFFFFFF / 0x80000000. I rearranged so that the minimum hexadecimal value corresponds to the lowest memory address, which I think is a bit more intuitive.

Next, we need to "allocate memory" when we create a new object (in this case, our doubly-linked list object). When we do this, we need to specify the amount of memory we need. Let's create a malloc-like function for Memory:

  public static String malloc(long size) {

    // if object has zero size, return NULL address
    if (size < 1) return "0x00000000";

    // if object cannot fit in memory, throw error
    if (size > (long)Integer.MAX_VALUE - (long)Integer.MIN_VALUE )
      throw new IllegalArgumentException("insufficient memory");

    // if object can fit in memory, get largest possible address

    long first = Integer.MIN_VALUE;
    long last  = (long)Integer.MAX_VALUE - size;

    // if only one possible memory address, return that one
    if (first == last) return "0x00000001";

    // ...else, randomise over valid range
    int value = random.ints(1L, (int)first, (int)last).toArray()[0] + 1;

    // ...and return as address
    return intToAddress(value);

  }

This, of course, is an extremely simple and inefficient way of allocating memory. There are better solutions, but they require a bit more work. Let's use this simple solution for now. Finally, we need a way to "register" memory so we don't accidentally overwrite an object with another one.

  // keep track of which int-indexed blocks are occupied by data
  private static HashSet<Integer> occupied = new HashSet<>(Arrays.asList(Integer.MIN_VALUE));

  // free memory within a certain range
  public static void free(String iAddress, String fAddress) {
    int iAdd = addressToInt(iAddress);
    int fAdd = addressToInt(fAddress);

    // remove all addresses in range
    occupied.removeAll(IntStream.range(iAdd, fAdd).boxed().collect(Collectors.toList()));

    // check that "NULL" is still "occupied"
    occupied.add(Integer.MIN_VALUE);
  }

  // free all memory
  public static void free() {
    free("0x00000001", "0xFFFFFFFF");
  }

  // list of objects in memory
  public static HashMap<String, Object> refTable = new HashMap<>();
  static { refTable.put("0x00000000", null); }

  // dereference object
  public static Object dereference(String address) {
    return refTable.get(address);
  }

The above, of course, doesn't really work, as we don't check in malloc that memory is registered or not. To have a fully-working, robust solution, we would need a much more complex memory-allocating engine. But anyway, this is probably enough to play around a bit:

jshell> Memory.occupied
$83 ==> [-2147483648]

jshell> Memory.malloc(1)
$84 ==> "0xB8B7087E"

jshell> Memory.occupied
$85 ==> [-2147483648, 951519358]

jshell> Memory.intToAddress(951519358)
$86 ==> "0xB8B7087E"

jshell> Memory.malloc(2)
$87 ==> "0x802AD3C8"

jshell> Memory.occupied
$88 ==> [-2147483648, 2806728, 2806729, 951519358]

Finally we can start to talk about XOR-linked lists.

XOR-Linked Lists

Recall the definition of a doubly-linked list from earlier:

Each element of a doubly-linked list contains:

some data
a pointer or reference to the next element in the list
a pointer or reference to the previous element in the list

So we need a class of Nodes for the elements of the list, and a DoublyLinkedList class. A simple Node class might look like:

  class Node<U> {

    // number of "bytes" to allocate for a DLL
    static final int size = 20;

    U      data; // data held by this DLL element
    String next; // address of next DLL element
    String prev; // address of previous DLL element
    String addr; // address of this DLL element

    // constructor with no "next" or "prev" elements
    public Node (U data) {

      this.data = data;
      this.next = "0x00000000"; // null
      this.prev = "0x00000000"; // null

      // allocate memory for this DLL element
      this.addr = Memory.malloc(size);

    }
  }

We can add methods to get and set the addresses of the next and prev (previous) nodes in the list:

    // method to get a "pointer" to this object ("get_pointer")
    String ptr() { return this.addr; }

    // getters for next and prev
    String next() { return this.next; }
    String prev() { return this.prev; }

    // setters for next and prev
    void next(String addr) { this.next = addr; }
    void prev(String addr) { this.prev = addr; }

Finally, we need a way of "allocating memory" for these objects, assigning them a memory address, and "dereferencing" that address. For our DoublyLinkedList and Node classes to access our Memory class, we need to package them into a *.jar. I'll create a Maven project with all of this code. Then, we can expand the DoublyLinkedList object...

public class DoublyLinkedList<T> {

  // List of Nodes
  private List<Node<T>> Nodes;

  // get number of Nodes in this List
  public int size() { return this.Nodes.size(); }

  // constructor
  public DoublyLinkedList() {
    this.Nodes = new ArrayList<>();
  }

  // add a Node to the end of the List
  public DoublyLinkedList<T> add(T t) {
    Node<T> newNode = new Node<>(t);

    // if this List already has Nodes
    if (this.size() > 0) {

      // get Node which previously was last Node
      Node<T> oldLastNode = this.Nodes.get(this.size()-1);

      // edit last Node in List to point to _new_ last Node
      oldLastNode.next = newNode.ptr();

      // edit new last Node to point to _old_ last Node
      newNode.prev(oldLastNode.ptr());
    }

    // add new last Node to end of List
    this.Nodes.add(newNode);

    // so add() can be chained
    return this;
  }

  /* Node inner class */

}

Now, we can use Node.ptr() to get the "address" of a Node element, and "dereference()" addresses to get the objects they refer to (note that I also @Override the default toString() methods for Node and DoublyLinkedList):

$ jshell -cp target/006-1.0-SNAPSHOT.jar
|  Welcome to JShell -- Version 11.0.2
|  For an introduction type: /help intro

jshell> import DCP.*

jshell> DoublyLinkedList<Integer> dll = new DoublyLinkedList<>();
dll ==> 

jshell> dll.add(1)
$3 ==> 
0x00000000 <- 0xA0EAA2D0 -> 0x00000000
      null <-          1 ->       null

jshell> dll.add(2)
$4 ==> 
0x00000000 <- 0xA0EAA2D0 -> 0x29728E8A
      null <-          1 ->          2

0xA0EAA2D0 <- 0x29728E8A -> 0x00000000
         1 <-          2 ->       null

jshell> dll.add(3)
$5 ==> 
0x00000000 <- 0xA0EAA2D0 -> 0x29728E8A
      null <-          1 ->          2

0xA0EAA2D0 <- 0x29728E8A -> 0x5DBD6A3C
         1 <-          2 ->          3

0x29728E8A <- 0x5DBD6A3C -> 0x00000000
         2 <-          3 ->       null

Overriding toString(), I have each Node give its address, as well as the addresses of the next and prev Nodes on the first line, and their values on the second line:

0xA0EAA2D0 <- 0x29728E8A -> 0x5DBD6A3C
         1 <-          2 ->          3

With an XOR-linked list, instead of holding both the prev and next addresses, we hold the XOR of those addresses:

jshell> Memory.addressToInt("0xA0EAA2D0")
$7 ==> 552248016

jshell> Memory.addressToInt("0x5DBD6A3C")
$8 ==> -574789060

jshell> $7 ^ $8
$9 ==> -44578580

jshell> Memory.intToAddress($9)
$10 ==> "0x7D57C8EC"

XOR-linked lists cannot allow random access. We must start at either the beginning or the end of the list and work our way down the list. If we start with the first Node of our above list, for instance:

0x00000000 <- 0xA0EAA2D0 -> 0x29728E8A
      null <-          1 ->          2

jshell> int both = (Memory.addressToInt("0x00000000") ^ Memory.addressToInt("0x29728E8A"))
both ==> 695373450

jshell> Memory.intToAddress(both)
$12 ==> "0xA9728E8A"

"To start traversing the list in either direction from some point, the address of two consecutive items is required. If the addresses of the two consecutive items are reversed, list traversal will occur in the opposite direction"
-- Wiki

The above Wikipedia quote notes that we need the address of the previous element as well as both to traverse the list:

jshell> Memory.intToAddress(both ^ Memory.addressToInt("0x00000000"))
$18 ==> "0x29728E8A"

For the second element in our list above:

jshell> int both = (Memory.addressToInt("0xA0EAA2D0") ^ Memory.addressToInt("0x5DBD6A3C"))
both ==> -44578580

jshell> Memory.intToAddress(both)
$20 ==> "0x7D57C8EC"

jshell> Memory.intToAddress(both ^ Memory.addressToInt("0xA0EAA2D0"))
$21 ==> "0x5DBD6A3C"

We can see that we need the previous address and the XOR both to get the next address, or the next address and the XOR both to get the previous address. We don't have to store two addresses in each Node, but we do need to keep track of the address of that previous Node somewhere. All in all, this is quite a clunky data structure with little benefit. It may have been more useful in the early days of computing, when memory was at a premium, but it's probably best to avoid it now. (Not to mention that it's actually impossible to implement in Java.)

Discussion

This was an interesting exercise, but mostly for reasons that had nothing to do with the actual prompt. Learning about hexadecimal literals in Java, converting between them and integers, implementing a fake 4GB storage space with a NULL address, faking pointers and dereferencing... this was all really interesting and informative. XOR-linked lists? Not as much.

In the future, I might try to re-implement my fake memory space with a better malloc. I think this would be a really informative challenge. What do you think?

All the code for my Daily Coding Problems solutions is available at github.com/awwsmm/daily.

Suggestions? Let me know in the comments.

Blog