Java Under the Hood

sivantha96

Sivantha Paranavithana

Posted on July 12, 2020

Java Under the Hood

This post offers a brief introduction to how Java executes a code written in Java Language under the hood.

Here is the list that I am going to explore,

1. Java Compiler
2. JVM

Java Compiler

Is Java a compiled-language or an interpreted-language?

Kinda like both! The reason lies within the compilation process of Java.
In many other languages, their compilers convert the source code into machine-specific code and then the machine will execute the instructions resides in that machine code.

But in Java, the Java Compiler does not convert Java source code into machine code (i.e. Binary) directly. Instead, it converts the source code into an intermediary code called bytecode. Then the Java Virtual Machine (JVM) will execute that bytecode by interpreting it to the machine code. But JVM uses a Just In Time (JIT) compiler to compile some of the code into native code (machine code). Therefore, Java is both compiled and interpreted language.

The javac is a component of the Java Development Kit (JDK) which specifies the Java compiler.

The Java compiler transforms the source code located in .java files into .class files which are essentially the bytecodes of those Java Codes.

Not only just Java, basically any language can implement its compiler that parses the source code into valid bytecode, and then it can be executed using the JVM.

If you have multiple classes in a single .java file then, it will generate a .class file for each class.

Java Virtual Machine

After javac compiles the source code to bytecode, JVM executes it. This is called the program run phase.

The JVM is divided into three main subsystems.

  1. Classloading Subsystem
  2. Runtime Data Areas
  3. Execution Engine

Other than that it consists of Native Method Libraries which are platform-specific executable code (written in c/c++) contained in libraries or DLLs and a Java Native Interface (JNI) which the interface that Execution Engine use to interact with the Native Method Libraries.

Classloading Subsystem

Classloading Subsystem is used for loading, linking and the initialization of the .class files generated by the javac

Loading

Java classes aren't loaded into memory all at once. They get loaded when they are required by an application (dynamic loading). Classes are loaded with the help of three class loaders.

  1. Bootstrap Classloader - This loader is responsible for loading the core classes such as java.lang.Object, java.lang.Class and java.lang.Classloader from bootstrap classpath which is rt.jar. This Classloader is the parent of all the Classloaders.

  2. Extention Loader - This loader continues the loading process by loading the classes that are an extension of the standard core Java classes. These classes are available to all applications running on the platform (i.e. JRE).

  3. Application Loader - The loading ends by loading the initial user-defined class which resides in the application level classpath, which mentioned in the Environment Variable.

Above classloaders will follow Delegation Hierarchy Algorithm while loading class

What is Delegation Hierarchy Algorithm?

When a Classloader is requested to load a class, the Classloader will delegate the request to the parent Classloader.

For example, if the JVM is requested to load a class, the Application Classloader will delegate it to the Extension Classloader. Then the Extension Classloader will delegate it to the Bootstrap Classloader. If the Bootstrap Classloader is unsuccessful in loading the class, then the Extension Classloader will try to load it. Only if the Extension Classloader fails to find the class, then the Application Classloader will try to load the class.

If the class is not found even after the Application Classloader tries to load it, then an error will be thrown.

Linking

Linking a class involves following operations,

  1. Verification - Ensure the bytecode is structurally correct.

  2. Preparation - Memory will be allocated for static variables and the default values will be assigned to them.

  3. Resolution - Symbolic memory references will be replaced with the actual values.

Initialization

This is the final phase of the Classloading subsystem. Here, all static variables will be assigned with their original values and then the static block will get executed. As a result, the main() method will get executed, therefore the other classes as well. It will cause the loading, linking, and initialization of those classes.

Runtime Data Area

The JVM creates multiple runtime data areas. Some of them are created and destroyed with the JVM and some get created when a new thread is created and destroyed when the respective thread ends.

There are five major data areas in the JVM.

Method Area

The simplest type of memory to manage. This is a shared resource. There is only one Method Area per JVM. It can consist of anything that can be completely determined at compile time such as static variables, constants(perhaps), code.

Heap Area

The least organized and most dynamic data area. This is a resource that is shared with all threads. The Heap is used to dynamically allocate and deallocate memory for class instances (objects) and arrays. Special operations such as new are needed to allocate heap storage. The memory assigned for objects never explicitly deallocated and this space is reclaimed by the garbage collector(discussed later). The memory assigned for the Heap is not contiguous. Deallocation may leave "holes" in the heap (a.k.a fragmentation).

Stack Area

For every thread, a separate runtime stack will be created. Therefore data stored in the stack are thread-safe, unlike in Method Area and Heap Area. For every method call, one entry will be made in the stack called a Stack Frame. A Stack Frame is divided into three subentities.

  • Local Variable Array - stores local variables and their corresponding values.

  • Operand Stack - If any intermediate operation is required to perform, then this will act as a runtime workspace to operate.

  • Frame Data - All symbols corresponding to the method are stored here. The catch block information is also stored here.

PC Registers

Each thread will have separate PC Register to hold the address of the machine instruction which is currently executing.

Native Method Stacks

For each thread, a Native Method Stack will be created to hold the native method information provided by the Native Method Libraries.

Execution Engine

After the bytecode load into memory and the Runtime Data Areas are allocated, then the execution of the bytecode will be done by the Execution Engine. Execution Engine consists of three subsystems.

Interpreter

The interpreter interprets the bytecode faster but executes slowly. If one method is called multiple times, every time the interpreter will interpret it.

JIT Compiler

The Just In Time (JIT) compiler will identify the hotspots of the code which are the code that gets repeated and get interpreted repeatedly, and compile those code into native code (machine-specific code) which improves the performance. The JIT compiler consists of the following components.

  • Intermediate Code Generator - Produces intermediate code for optimization.

  • Code Optimize - optimize the intermediate code generated above. Such as elimination of common sub-expressions, translation from stack operations to register operations, reduction of memory accesses by register allocation, etc.

  • Target Code Generator - Generate Machine Code (Native Code)

  • Profiler - Finds hotspots in the bytecode.

Garbage Collector

Collects and removes unreferenced objects(inaccessible objects / orphans). Garbage Collection can also be triggered manually by calling System.gc().

Thanks for reading.

See you in the next post!

💖 💪 🙅 🚩
sivantha96
Sivantha Paranavithana

Posted on July 12, 2020

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related