iOS String to Kotlin ByteArray Performance Analysis

arsenikavalchuk

Arseni Kavalchuk

Posted on November 18, 2024

iOS String to Kotlin ByteArray Performance Analysis

When working with Kotlin Multiplatform (KMP), interoperability between Kotlin and native code can introduce performance bottlenecks. One such case is converting a Swift String into a Kotlin ByteArray. In this article, we analyze the performance of several approaches to improve this conversion process. We write Swift String extension functions and Kotlin ByteArray factory methods using native pointers and system functions like memcpy to optimize performance. Full source code. Refer to How to set up KMP library in iOS for the details about KMP library and integration with iOS.

Kotlin ByteArray API in iOS

The Kotlin ByteArray is exposed to iOS as the KotlinByteArray class, which provides basic methods like get(index) and set(index:value:), and constructors like init(size:). However, this API is inefficient for scenarios requiring high-performance operations on byte arrays.

KotlinByteArray Interface

__attribute__((objc_subclassing_restricted))
__attribute__((swift_name("KotlinByteArray")))
@interface KmpLibKotlinByteArray : KmpLibBase
+ (instancetype)arrayWithSize:(int32_t)size __attribute__((swift_name("init(size:)")));
- (int8_t)getIndex:(int32_t)index __attribute__((swift_name("get(index:)")));
- (void)setIndex:(int32_t)index value:(int8_t)value __attribute__((swift_name("set(index:value:)")));
@property (readonly) int32_t size __attribute__((swift_name("size")));
@end
Enter fullscreen mode Exit fullscreen mode

This interface makes random access and element-wise operations slow due to its lack of batch processing capabilities.

Using Native APIs in KMP

KMP common code cannot access native APIs directly, but native parts of the KMP code can leverage platform-specific functions plus using cinterop API. This enables us to optimize byte array copying by using native constructs.

Working with Native Pointers

In Kotlin/Native, the CPointer type is used to interface with raw memory through pointers. Understanding the differences between pointer types like CPointer<Byte> and CPointer<ByteVar> is essential for efficient memory operations and interoperability with native libraries.

CPointer<Byte> represents a pointer to an immutable sequence of bytes. It is typically used when you want to read data from a memory location without modifying its content. This type is ideal for operations where the memory is treated as read-only, such as parsing a buffer or reading data from a constant memory region.

For example:

fun printByteArray(data: CPointer<Byte>, size: Int) {
    for (i in 0 until size) {
        println(data[i])
    }
}
Enter fullscreen mode Exit fullscreen mode

In this case, data points to a sequence of bytes, and the function iterates over the memory to print each byte.

CPointer<ByteVar> is a pointer to a mutable byte variable. It is used for memory regions that can be written to, such as buffers for receiving data or memory blocks that are initialized and modified. The ByteVar type encapsulates a mutable Byte value in Kotlin/Native, allowing operations like setting new values or performing in-place modifications.

For example:

fun setByteArray(data: CPointer<ByteVar>, size: Int, value: Byte) {
    for (i in 0 until size) {
        data[i] = value
    }
}
Enter fullscreen mode Exit fullscreen mode

Here, data is a mutable pointer, and the function writes a specified value to each byte in the memory block.

Default ByteArray Handling

The Kotlin/Native ByteArray.readBytes is a convenient but inefficient function that loops over each byte, as shown below:

@OptIn(ExperimentalForeignApi::class)
fun byteArrayFromPtrReadBytes(data: CPointer<ByteVar>, size: Int): ByteArray =
    data.readBytes(size)
Enter fullscreen mode Exit fullscreen mode

This essentially goes into this implementation in Kotlin:

fun getByteArray(source: NativePointed, dest: ByteArray, length: Int) {
    val sourceArray = source.reinterpret<ByteVar>().ptr
    for (index in 0 until length) {
        dest[index] = sourceArray[index]
    }
}
Enter fullscreen mode Exit fullscreen mode

Optimizing with memcpy

Instead of looping, we can use the highly efficient POSIX memcpy:

@OptIn(ExperimentalForeignApi::class)
fun byteArrayFromPtrMemcpy(data: CPointer<ByteVar>, size: Int): ByteArray {
    return ByteArray(size).also {
        it.usePinned { pinned ->
            memcpy(pinned.addressOf(0), data, size.toULong())
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Testing Approaches

We implemented five test cases to compare performance:

  1. Loop Copy: Convert a Swift String to a byte array using a loop.
  2. ReadBytes: Use ByteArray.readBytes to copy from a pointer.
  3. Memcpy: Use memcpy to copy from a pointer.
  4. Swift UTF8 Byte Array: Convert String.utf8 to a byte array and compare performance with readBytes and memcpy.
  5. Swift UTF8 CString Pointer: Use String.utf8CString with both readBytes and memcpy.

Swift Implementations

Here are the Swift extension functions:

Loop Copy

func toKotlinByteArrayLoopCopy() -> KotlinByteArray {
    let utf8Bytes = Array(self.utf8)
    let kotlinByteArray = KotlinByteArray(size: Int32(utf8Bytes.count))
    for (index, byte) in utf8Bytes.enumerated() {
        kotlinByteArray.set(index: Int32(index), value: Int8(bitPattern: byte))
    }
    return kotlinByteArray
}
Enter fullscreen mode Exit fullscreen mode

Data Pointer with readBytes

func toKotlinByteArrayDataPtrReadBytes() -> KotlinByteArray {
    var data = Array(self.utf8)
    let size = Int32(data.count)
    return data.withUnsafeMutableBytes { ptr in
        ByteArrayUtilKt.byteArrayFromPtrReadBytes(data: ptr.baseAddress!, size: size)
    }
}
Enter fullscreen mode Exit fullscreen mode

Data Pointer with memcpy

func toKotlinByteArrayDataPtrMemcpy() -> KotlinByteArray {
    var data = Array(self.utf8)
    let size = Int32(data.count)
    return data.withUnsafeMutableBytes { ptr in
        ByteArrayUtilKt.byteArrayFromPtrMemcpy(data: ptr.baseAddress!, size: size)
    }
}
Enter fullscreen mode Exit fullscreen mode

UTF8 CString with readBytes and memcpy

func toKotlinByteArrayUtf8CStringReadBytes() -> KotlinByteArray {
    var data = self.utf8CString
    return data.withUnsafeMutableBufferPointer { ptr in
        ByteArrayUtilKt.byteArrayFromPtrReadBytes(data: ptr.baseAddress!, size: Int32(strlen(ptr.baseAddress!)))
    }
}

func toKotlinByteArrayUtf8CStringMemcpy() -> KotlinByteArray {
    var data = self.utf8CString
    return data.withUnsafeMutableBufferPointer { ptr in
        ByteArrayUtilKt.byteArrayFromPtrMemcpy(data: ptr.baseAddress!, size: Int32(strlen(ptr.baseAddress!)))
    }
}
Enter fullscreen mode Exit fullscreen mode

Benchmark Results

The results from running 1000 iterations of each method:

Method Time (ms)
LoopCopy 32.10
DataPtrReadBytes 2.60
Utf8CStringReadBytes 0.89
DataPtrMemcpy 0.06
Utf8CStringMemcpy 0.02

Insights

  • LoopCopy is the slowest, due to its repeated calls to set(index:value:).
  • Using readBytes significantly improves performance but is still not optimal.
  • Memcpy is the fastest method due to its highly efficient memory operations.
  • The combination of Swift's utf8CString and Kotlin's memcpy achieves the best performance.

Conclusion

For optimal performance when converting a Swift String to a Kotlin ByteArray, use the following:

  • Kotlin: Implement a ByteArray factory using memcpy.
  • Swift: Use utf8CString with unsafe buffer pointers.

This combination delivers minimal overhead, unlocking high-performance interoperability in KMP.

The full implementation is in the project on GitHub.

💖 💪 🙅 🚩
arsenikavalchuk
Arseni Kavalchuk

Posted on November 18, 2024

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related