On software engineering and optimization

From Bryan Cantrill’s blog:

Adding a hook of this nature requires an understanding of the degree to which the underlying code path is performance-critical. That is, to contemplate adding this hook, we needed to ask: how hot is closef(), anyway? Historically in my career as a software engineer, this kind of question would be answered with a combination of ass scratching and hand waving.

Sadly too true in my experience as well. Today when I’m reviewing a patch and a performance increase is claimed, I always ask for numbers and methodology. You’d think this would be the norm – most of the advancement in our society over the last few hundred years lies with the scientific method, but the problem is it’s just too damn easy to modify software. Why bother actually measuring when we can just make a change, find out it’s broken later, then change it again immediately?

This gets to something that’s been on my mind lately, which is that we should only try to optimize for two things: latency, and power usage. The nice thing about this is that “traditional” tradeoffs like Space-time are neatly encapsulated by power usage, because RAM, CPUs/GPUs, and hard disks consume power. Is it a good idea to cache that file in memory (parse the file once, but forces the system to retain it in RAM, at a constant power draw), or re-parse it when we need it periodically, then discard the data (more CPU draw periodically, less constant RAM draw)? If you’re optimizing for power draw, looking at representative workloads would give you the answer. Even better, power usage is specific to particular machines, which is how real-world optimization works.


  1. I think that RAM is refreshed periodically. The fact that RAM is used or not doesn’t change that, so as soon as the system is up, RAM is refreshed.

    Even in suspend-to-ram, RAM is still refreshed (it’s nearly the only hardware still powered in that state).

    If a HDD is used, loading things in RAM could allow to totally power the HDD off, which could be good power-wise.

    • Right, but other processes could be using that RAM, or it could be used to cache filesystem data to avoid spinning the hard disk more. You’re right though that my phrasing is suboptimal; the question of whether or not to use RAM would come down to how much it helps or hurts the total system.

      That all said, there is work on powering off RAM: http://lwn.net/Articles/446493/

  2. I’ve worked on a number of deeply embedded systems with rtos and on bare metal. After profilong about 90% of the time i’ve seen the biggest bang for the buck is always reducing code size. Reduces icache misses, reduces power (ie by driving io pins less often). I usually just look at size reduction before even thinking about anything else now.

    The strangest bug i worked on, was bisected to a very unrelated change, and for the longest time i couldnt figure out why this unrelated change made a difference. Eventually it was determined that i had incteased cache misses, caused the current draw if the chip to go up, and the power supply of the board to drop by 0.005 volts. Another component on the board was sitting right at a switching threshold(through.resistor dividers), and tjis minor voltage drop caused another chip to mistakenly measure a fault and take corrwctive action. Which rippled through the system in a non-obvios way

  3. You forgot one thing: maintainability.

    Always optimize for code readability and understandability first.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s