FixedWidthInteger Array to FixedWidthInteger

If you are to parse a binary file from scratch, Data is your go-to Foundation framework class. For the toy Mach-O parser Asura, I am writing for the series Mach-O internals, It isnt’ very different. And if we read the documentation about Data it conforms to ContigiousBytes protocol and MutableCollection protocol. And per the description of Data, it brings buffers to take the behavior of Foundation objects. So, it’s a buffer or an Array/Collection that behaves like a single object. So, the next question we should be asking about this masquerading collection is. What are its ‘Elements’?

What are the ‘Elements’ of Data?

If we look around ‘Data’s doc page, it is easy to conclude. The elements of this collection are UInt8 types. But let us gather more evidence. And our clue lies with ContigiousBytes! The name tells, it is “continous bytes”. And 1 byte = 8 bits, and the type which can store it should be a FixedWidthInteger with 8 bits of storage space. But we want to be sure. And probably ask more questions. And the best place to ensure our answers is the swift source code. A peek at ContigiousBytes confirms this for us.

But why not Int8 ?

It is because Int8 is a signed Integer, this means one bit of storage is used to store the sign ‘+’ / ‘-’ with the help of 0/1. So it can only represent 2^7 = 128 (0-127) different values. Compared to UInt8, where all the 8 bits can be used to represent values, ie 2^8 = 256 (0-255) different alues

Why do I need to convert a FixedWidthInteger Array to another FixedWidthInteger types ?

We can combine multiple UInt8s togther to create UInts of bigger byte size. And a lot of file formats uses higher byte size to represent structure fields or data values. Let us proceed with an example.

Consider “😀” emoji. Now this string value needs 4 bytes to represent it values. That is 4 UInt8 or one UInt32. (8 * 4 = 32). Don’t take my word for it. Spring up your swift repl and type up the code

import Foundation
let smile = "😀"
let smileData = smile.data(using: .unicode, allowLossyConversion: false)
let smile8s: [UInt8] = [UInt8](smileData!)

And the Swift Repl should show you what data bit looks like

smile8s: [UInt8] = 6 values {
  [0] = 255
  [1] = 254
  [2] = 61
  [3] = 216
  [4] = 0
  [5] = 222
}

Now let us combine 4 UInt8 into 1 UInt32, aka the point of this whole post. We will take a leap of faith into the code and then understand the mechanics of converting a FixedWidthInteger Collection to another FixedWidthInteger.

extension Array where Element: FixedWidthInteger {
    func flatten<S>() -> S where S: FixedWidthInteger {
        return self.reduce(S(0)) { ($0 << Element.bitWidth) | S($1) }
    }
}

So, what exactly flatten do here ?

Consider the number, 500. If we try to assing it to a UInt8 variable. The compiler will throw an error

let x: UInt8 = 500
 error: integer literal '500' overflows when stored into 'UInt8'

An overflow happens when we try to fit in something that is bigger than there is space for. UInt8 can represent only values from 0-255. Anything bigger than 255 overflows its size

But we can represent 500 with 2 UInt8s. 500 in binary is 111110100. Which is 9 bits or can be written as two 8 bits (since we have no 1 bit type available 😜 ) 00000001, 11110100 = 1, 244

let x: [UInt8] = [1, 244]
let y: UInt16 = x.flatten()
y: UInt16 = 500

Now, let us try it with our “😀” emoji.

import Foundation
let smile = "😀"
let smileData = smile.data(using: .unicode, allowLossyConversion: false)
let smile8s: [UInt8] = [UInt8](smileData!)
let a: UInt64 = smile8s.flatten()
a: UInt64 = 281467424342238

Whats the opposite ?

Well, here is the code. Its too late at night for me to write an explanation for it. But, its quite easy since its just in a way a reversal of what we did.

extension FixedWidthInteger {
    func explode<S>() -> [S] where S: FixedWidthInteger  {
        var value = self
        return (0...self.bitWidth/S.bitWidth-1).map { _ in
            defer { value = value >> S.bitWidth }
            return S(value & Self(S.max))
        }
    }
}

And a simple example output!

let a: UInt32 = 1024
let b: [UInt8] = a.explode()
a: UInt32 = 1024
b: [UInt8] = 4 values {
  [0] = 0
  [1] = 4
  [2] = 0
  [3] = 0
}

And let us try converting our smiley into a number and back into its original form!

let smile = "😀"
let smileData = smile.data(using: .unicode, allowLossyConversion: false)
let smile8s: [UInt8] = [UInt8](smileData!)
let a: UInt64 = smile8s.flatten()
let b: [UInt8] = a.explode()
let data: Data = Data(b[2...]) // Its a little trick we have to do since we don't have UInt48.
// We are only interested in the 6 bytes here
let smileBack = String(data: data, encoding: .unicode)