The Software Optimisation Paradox – Why Should You Care?

Moore’s law is over. As it stands today, we’re pushing up against the laws of nature, the physical limits of how you can construct a transistor – atomic limits.

Currently, the best fabrication methods from Intel have produced transistors of 14nm in size, and nVidia have gone slightly better with 12nm. Although they are expected to shrink to around 5-7nm over the next few years, this seems to be about as far as we can go. Beyond that point, transistors will become so small that quantum effects prevent them from working properly.

What does this mean for the software created to run on these processors? Computer scientists have known this day would arrive for some time and have been preparing by taking advantage of Massively Parallel Processing and distributed computing. The emergence of cloud computing as an affordable technique meant that software engineers could take advantage of huge amounts of distributed computing power, allowing their software to run at global scale.

But there remains a problem. In recent years, I’ve seen the term ‘premature optimisation’ as a mantra not to optimise at all. This has led to complacency among certain rank and file engineers, and as a child of the home computing revolution in the early 80s, this doesn’t sit well with me. Optimisation was part of the game back then when you had to scrimp and scrape for every byte – it’s amazing what you can do with 64KB.
Let’s take a look at the node.js / npm landscape as a modern example of the point I’m trying to make. I came across a hilarious article recently that truly illustrates one of the problems at the heart of node.

‘Code bloat’ has taken on a new meaning in this paradigm. Even the simplest “Hello World!” app takes up a huge 1.5MB of space! This can’t be right. I know I’m going to sound like an old codger here, but in the early days of web development we imposed limits of 90 KB – including imagery – for an entire web page otherwise we couldn’t guarantee a good experience on dial-up. (Gen-Z: Yes, we used a telephone connection to dial into an Internet Service Provider – you could even hear the dial tone and sound of the data transferring).

So why is this a problem and what exactly is the paradox here? Well, I believe we’re losing a valuable skill by not choosing to optimise when the opportunity arises. The paradox is huge amounts of creativity can come through constraint. To illustrate this, I have three examples: the modern demoscene, the Oculus Rift, and Banksy.

The idea of creating ‘demos’ that capture
the limits of a particular device has been
around since the early 80s and arguably even earlier. The idea was to ‘crack’ the copy protection on a game and to share it with your friends (usually by passing around a cassette tape) with an added introduction screen showing off your skills as a coder. These intro screens became more and more complex as coders became more familiar with the intricacies of the processor and other chips in the computer. Eventually, these demos took on a life of their own, no longer limited to the intro / loading screen of a game but rather a full audio visual experience of its own.

The most prolific ‘scene’ of the 80s was the commodore 64 scene. Believe it or not, it is still alive today. Coders are managing to find ways to do crazy things with 64KB of RAM and a 1MHz 6510 processor. There are demos that show off video streaming – something that wasn’t deemed possible until the mid-nineties with more powerful multimedia PCs. Meanwhile, other demos show that the perceived limits of the hardware can be extended by some clever use of ‘undocumented opcodes’ (or illegal opcodes as they were colloquially known). Techniques like ‘Any Given Screen Position’, ‘Flexible Line Interpretation’, and ‘Multiplexed Sprites’ wouldn’t have been invented has there not been a hard limit on the VIC chip and the 6510.
Physical constraints of the hardware forced coders to think again about what was possible and to get creative with solutions.

For a more modern example, look at some of the advances made by the Oculus team. Video latency is one of the major limiting factors to getting VR to a place where your brain doesn’t baulk and induce motion sickness.
John Carmack (CTO at Oculus – and personal hero of mine) has regularly spoken about solving the problem of latency across the net, through innovations in routing, networking software, and switches (and of course hard infrastructure) – meanwhile, video latency has never been an issue until now. Good VR needs to push beyond these limits to overcome the aforementioned problems, so Oculus has rightly invested a great deal of time in clever optimisations and techniques that can trick the brain into full immersion. After all, Carmack is the king of optimisation.

Finally, I want to talk about Banksy. His artwork leaves an indelible mark on all who see it and always conveys an important message, but what you don’t see is the amount of planning that goes into each piece.
The constraint for Banksy is the time available to implement what many consider to be an illegal activity. There is only minutes, possibly even seconds available to put up the work and this constraint leads to unknown innovation behind the scenes to make sure that it is executed perfectly. The detail in some of the pieces is exquisite and so the stencilling must be equally so. The effort that goes into planning the execution, sometimes using ‘workmen’ disguises and other mechanisms involves a certain amount of genius too.

The takeaway? Optimisation is a skill, and it’s a fundamental one at that. It shouldn’t be thought of simply as the ‘right thing to do’, but also a great way to bolster and support innovation through constrained creativity. As engineers, it’s therefore incumbent on us to make an effort to be better.