Introduction to Metal Compute

2021-05-08

For a couple of years, I've been working as an iOS software engineer with a focus on GPGPU using Metal. It is an interesting sphere of iOS development that still lacks spread. In a series of articles, I'm going to describe how to build a simple image-processing Metal app.

Importance of visual data processing#

"It is better to see something once than to hear about it a thousand times." - Asian proverb.

Most of us heard the fact that the brain gets most of the information about the surrounding world with the help of the eyes. The eye’s retina, which contains 150 million light-sensitive rod and cone cells, is actually an outgrowth of the brain. In the brain itself, neurons devoted to visual processing number in the hundreds of millions and take up about 30% of the cortex, as compared with 8 % for touch and just 3% for hearing. Humans are visual creatures and visual information such as paintings, photos, videos, 3D games etc. surrounds us everywhere.

The tech industry always tried its best to help us with the constantly increasing need for the creation and processing of graphical data. With the distribution of various social networks, image and video processing apps gained popularity over the last years. Being an iOS software engineer it is quite promising to master modern ways of processing visual data. Currently, the most efficient way to work with images is to use the horsepower of the device’s GPU with a minimum abstraction layer. Luckily, Apple platforms provide us with such a possibility with Metal API. But first, let’s take a look at a brief history of how Metal came to life.

Road to Metal#

First and foremost it was all about drawing graphics on the screen as fast as possible. In the 1980s and early 1990s, this kind of work was often done by the CPU which, due to its architecture, was not very efficient at that task. As with all very demanding and domain-specific tasks, this work was gradually offloaded to a dedicated processor: the graphics processing unit was born.

At first, GPU vendors focused on 2D acceleration for desktop systems as well as monitor resolution and the quality of the generated analog signal. However, a new branch gained popularity over time: 3D acceleration. Graphics APIs like DirectX, OpenGL and 3dfx Glide, developed for computer games and data visualization, were designed with hardware support in mind. More and more calculation steps of these APIs were moved to dedicated hardware.

As the GPUs became faster in rendering basic computer graphics, demand for more advanced techniques grew. New pipeline stages for these effects were added to the APIs and quickly implemented in hardware. At that time most functionality was “fixed function”, that is for each effect dedicated API calls existed and were implemented in hardware. Each new effect resulted in API and hardware changes, greatly limiting the possibilities of graphics programmers.

After about 2001 general-purpose computing on GPUs became more practical and popular. The early efforts to use GPUs as general-purpose processors required reformulating computational problems in terms of graphics primitives.

This limitation resulted in a major architecture switch of GPUs. Highly specialized function units were replaced by small and simple moderately specialized processors. The complete fixed function was replaced by simple and limited programmable execution units. This greatly increased the flexibility of the hardware and allowed it to leverage the speed of a GPU without requiring full and explicit conversion of the data to a graphical form.

The idea to ignore the underlying graphical concepts in favor of more common high-performance computing concepts became to basis of such frameworks like Nvidia's CUDA, Microsoft's DirectCompute and Apple/Khronos Group's OpenCL. This means that modern GPGPU pipelines can leverage the speed of a GPU without requiring full and explicit conversion of the data to a graphical form.

Speaking about Apple platforms, before 2014, to take advantage of the power of the GPU one was able to use OpenGL and OpenCL on macOS and OpenGL ES on iOS. OpenGL being a low-level API that provides programmable access to GPU hardware for 3D graphics still tended to hide the communication between the CPU and the GPU which led to problems with performance overhead. Also, there was no way to use GPU for general-purpose computations on iOS.

Apple decided to solve all GPU-related demands by announcing a powerful new unified, low-level, low-overhead GPU programming API called Metal. It’s unified because it applies to both 3D graphics and data-parallel computation paradigms. Metal is a low-level API because it provides programmers near-direct access to the GPU. Finally, Metal is a low-overhead API because it allows for greater efficiency in terms of CPU and GPU communication and pre-compiling of resources.

Example App#

To learn how to work with Metal one needs practice. In this series of articles following step-by-step instructions, we are going to write a small image editor app that can make basic image adjustments.

Here is our plan:

Begin with a starter project.
GPU side: write our image editing kernel.
CPU side: write encoder for the kernel.
CPU side: image-to-texture conversion and kernel dispatching.
Replace UIKit image drawing with Metal.

See you in the next part!