Python Array VS Java Array
Liu Yongliang
Posted on August 2, 2020
I started coding with Python. In Python, the array is dynamic and you do not have to worry about the type of objects that it contains or the length of the array that it starts with. With that understanding of how arrays work, I was quite uncomfortable with strongly typed languages such as C or Java. One of my frustrations was the following:
Suppose I want to loop over a string and take only the uppercase letters. In Python I would simply do:
def func(s):
ans = []
for i in s:
if i == i.upper():
ans.append(i)
return ans
But in C (or Java), how do I initialize an array with a fixed size when I do not know what that final size will be?
After learning about data structures and algorithms, I finally have a slight sense of how things work behind the scene, and in this article, I would try my best to explain it, mainly for reflection purposes.
Why Array?
It all started with the need to group data based on certain criteria. Most of the time we would like to group data of the same type so that we can organize the information better for storage as well as retrieval.
Here comes primitives, the stuff that we get out of the box in programming languages such as Java, and are the building blocks for bigger programs. We have Integer, String, Boolean, and also the primitive static array. These are all containers that meet our basic needs for storing data.
The static array in Java requires you to specify the type of object that it contains and also the size.
For example,
int[] intArray = new int[5];
// a variable called intArray that points to an array
// that will keep 5 integers.
When we don't assign values to the slots of the array, it will be assigned with a default value, in the above case, integer zero.
This is all fine and dandy if we know exactly what we would like to keep in the array. One of the facts that is surprising to me is that arrays only (generally) keep one type of object. In Python, we don't consider the item type that goes into the array. So, I have always thought that you can put whatever you want into it, maybe one of them being an integer and the other being a string. However, come to think of it, it is typical and reasonable to only put items of the same type into the same array.
How Does Dynamic Array Help?
Enough with the primitive array. Now one way to deal with the fact that we do not know what to set for the initial size of an array is to simply initialize a large enough array and keep track of the end of the actual space taken up by our items.
In the uppercase-letter-in-string example, I could have initialized an array to be the size of the entire string. This way, if every character in the string is in uppercase, I would fill the array just fine, without any spillage. Well, this is a strategy, but what if we have potentially a million items and also possibly as minimal as one or two items in one array?
For example, we might be looping over millions of rows of data and only pick those that are below a certain threshold. In this case, do we still want to initialize a million row that might eventually be a complete waste of space? Now we are onto something.
What we want isn't a specific type of container. What we want is a type of container plus a set of instructions that it must fulfill. In our case, we want a dynamic, adjustable, flexible container that supports fast insertion and query operations. This is the definition of List Abstract Data Type (ADT). It is abstract, meaning it is more of mold than an actual implementation.
Now, we know what we want, do we have an idea of how to achieve it? Close to what we were discussing, we can make use of the thinking that as long as I can allocate more than enough space for the array, not too much but always more, we can solve the issue here. There is an actual implementation, in Java, called ArrayList (to which I read as implementing the List ADT using static array).
Brief Overview
We can initialize an array that takes in say 10 items. If the next item exceeds the space limit, we simply initialize an array with double the space and move everything over. There is a slight problem if you have not yet noticed: If we keep copying over items when we run into shortage, then we may have performance issues due to all these movements of data, from one array into another. A reasonable solution would be doubling the size every time it requires fixing. A good place where exponential growth is working to our favor.
Adding items to the back of the static array (within limit) would just require us to move the pointer to an empty spot and place a new value into it. This is very much constant. However, for insertion, if you would like to insert an item into the very first slot, we need to push other existing items back 1 slot, not to mention it might even trigger a copying operation to fix the overflow. I would say the time complexity is, therefore, O(n) for insertion. The good side is that we can query items in O(1), thanks to the indexing.
Here are some of the useful operations and their complexity analysis:
// Query or Retrieval
empty() // O(1)
size() // O(1) if keeping an variable to record and update this property
contains() // O(n) check through every item to find the match
getItemAtIndex(int i) // O(1) indexing is constant time
getFirst() // O(1)
getLast() // O(1)
// Insertion
addAtIndex(int i, int item) // O(n) on average shifting n/2 items to the right
addFront(int item) // O(n) shifting items to the right
addBack(int item) // O(1) if no need to enlarge array
// Deletion
removeItemAtIndex(int i) // O(n) on average shifting n/2 items to the left
removeFront() // O(n) Worst case
removeBack() // O(1) Best case
As for the space complexity, in the best case, we use exactly n slots for n items. For the worst case, 2n space is allocated for n+1 items. Hence, the overall space complexity is O(n).
We can use another implementation that tries to fulfill the List ADT requirement with another twist, the linkedList. It has more variants such as circular linked list, tailed linked list, doubly linked list, hence warranting discussion at length.
Conclusion
The hierarchy goes like this
- Primitives
- Java Static Array
- List Abstract Data Type (ADT):
- A dynamic data structure with efficient create/query operations.
- Implementations:
- ArrayList & LinkedList (Java), Array (Python)
So, Python arrays are not equal to Java static arrays, but it is quite close to Java ArrayList/LinkedList.
Posted on August 2, 2020
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.