Microbenchmark for Cache Friendliness Benefits

I’ve always heard of how being (CPU) cache friendly pays off. It always made sense to me, which isn’t a surprise given it’s a somewhat easy concept to grasp in theory (applying it in practice tends to be a little bit more complicated though). However, I’ve never had the chance to compare a cache friendly version of a code with a cache unfriendly version to see the differences. This is mostly due to the fact of me not being really someone who works on performance intensive code.

I decided to write a microbenchmark to see this in practice. I basically go through an array looking for an element. I have a cache friendly version of it and a cache unfriendly one.

This code uses the SDL2 library for timing functions so I can measure how long things take. Here is the C++ code.

http://paste.ubuntu.com/9052389/

Running it often gives me running times which are about 20 times faster for the cache friendly version:

FIND1: 0.0142819 secs.
FIND2: 0.302028 secs.
RATIO (delta1/delta2): 0.0472868.
RATIO (delta2/delta1): 21.1476.

Interestingly, the difference is also visible in javascript code:

http://paste.ubuntu.com/9052441/

Populate: 0.757 secs.
D1: 0.026 secs.
D2: 0.137 secs.
Ration D1/D2: 0.1897810218978102
Ration D2/D1: 5.269230769230769

In there though I only get ~5 times faster for the cache friendly version. Using a Float64Array instead of play Array makes population times way faster (~0.03s) but the other numbers remain somewhat similar. Here is a result I got from changing Array to Float64Array:

Populate: 0.033 secs.
D1: 0.026 secs.
D2: 0.127 secs.
Ration D1/D2: 0.2047244094488189
Ration D2/D1: 4.884615384615385

Tetris clone in SDL2 and C

I’ve started developing a tetris clone in SDL2 and C to practice both technologies and also game development.

The idea is to get some rough version going and then enhance with sounds and the particles-ish simulation I did last week.

Check it out, it’s on github:

What is in there, I did from sunday night to about monday 5 am, so don’t expect much. It currently only opens the game and shows the menu screen.

After finishing the rough version, I’ll write some sort of report talking about the development. It seems like it’ll be an interesting experience.

jsrinth – Labyrinth Generator for JS

I’ve written a simple labyrinth generator for JS.

https://github.com/phao/jsrinth

It allows you to specify the width and height of the labyrinth, and also a callback to be called for each wall that is going to be removed in the construction of the labyrinth. If you imagine the labyrinth being constructed by removing walls from a grid of square-ish cells, the callback will be called for each removed wall with the cell x,y position and an identifier for the removed wall (‘left’, ‘right’, ‘up’ or ‘down’).

Some example of labyrinths generated by it.

The code isn’t much fast. It takes 1.021s in my machine to generate a 100×100 labyrinth using nodejs v0.10.25, but it’s basically Prim’s algorithm for computing minimum spanning trees so it can be optimized as such or be replaced with anything roughly equivalent. Maybe I’ll try to improve its performance later, but I won’t need to for all I can tell. My intention is to use this for simple games, in which case I’ll be generating the labyrinth in the beginning of a game stage (for example) and I could actually put a web worker to do it in order to hide the latency.

Learning SDL – Second weekend

I recently started studying SDL2 (in C) on the weekends. On the first weekend, I started reading lazyfoo’s tutorials and didn’t do much more than just following through what was there.

In the second week, as suggested by Icculus (Ryan Gordon) I did a simple pong clone on Saturday, and on Sunday I decided to do something I did back when I was learning a bit of OpenGL, which was a simple particle-ish thing.

Both projects are on github:

Icculus actually twitted about my pong clone (=D), which got me some stars on github.

Some videos of the two apps:

If you don’t know who Ryan Gordon is (Icculus), you should look for his talks on youtube (https://www.youtube.com/results?search_query=icculus and also https://www.youtube.com/results?search_query=ryan+gordon). They’re very (very!) interesting to watch.

Microbenchmark – Go and C

Sometime ago I read this post (http://call-with-hopeless-continuation.blogspot.com.br/2010/03/lies-damn-lies-and-benchmarks.html - it’s in portuguese, btw) containing some microbenchmarks comparing chicken scheme (both interpreted and compiled with usual and fixnum arithmetic) to C, Python and Ruby. The post really got my interested, specially when Mario Goulart (the author of the post) showed that it was actually possible to have a scheme version with chicken compiler and fixnum arithmetic that ran faster than the C version he had compiled with GCC. You can read the post for more details.

The code tested in the post is the slow recursive version of fibonacci. And a huge blinking flashy notice is called for in here. When was the last time you had to run code like the recursive fibonacci algorithm? That code isn’t a representative of real life project code. The benchmark is merely to show that somethings about performance are not absolutes like some people think they are. The benchmark isn’t about giving absolute answers on the performance of those languages/implementations.

I recently decided to do the same thing for C and Go. And I got surprising results!

Here is the output of some commands that may be interesting to you:

[~]$ go version
go version xgcc (Ubuntu 4.9.1-0ubuntu1) 4.9.1 linux/amd64
[~]$ go build -x -gccgoflags '-O2 -march=native' fib.go
WORK=/tmp/go-build793938434
mkdir -p $WORK/command-line-arguments/_obj/
cd /home/phao
gccgo -I $WORK -c -g -m64 -fgo-relative-import-path=_/home/phao -o $WORK/command-line-arguments/_obj/main.o -O2 -march=native ./fib.go
ar cru $WORK/libcommand-line-arguments.a $WORK/command-line-arguments/_obj/main.o
cd .
gccgo -o fib $WORK/command-line-arguments/_obj/main.o -Wl,-( -m64 -Wl,-) -O2 -march=native
[~]$ gcc --version
gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[~]$ gccgo --version
gccgo (Ubuntu 4.9.1-0ubuntu1) 4.9.1
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Here is informaiton about my CPU (the output of /proc/cpuinfo) as well http://paste.ubuntu.com/8861913/.

Without using gccgo, the performance results are different. Gccgo is doing its magic behind the scenes.

Problem/Solution/Representation

Programming is about representing in a computer a solution to a problem. Fred Brooks said in his The Mythical Man-Month that representation is the essence of programming, so maybe that shines a light on the whole issue.

Immediately, we can start asking questions from that.

What is representing a solution?

How do we solve the problem in the first place?

Is solving the problem part of programming?

What representational tools do our programming languages provide us with?

Are we making the best use of those tools or are we just trying to program in X using the Y language?

This is the kind of argument people make by saying “You’re just writing C in Python.” or “You’re just writing Java in PHP.”.

How is representing your solution in an OO fashion better for your particular goals?

If you’re not doing OO, think of the other paradigms you’re using.

In our programs source code, should we emphasize the problem, the solution or the representation?

Can we emphasize one without sacrificing the other 2?

How can raising the level of abstraction help us with better representing solutions?

Given a program, can we change how the solution is represented in such a way to get better results?

This is obviously true. We can very often change how we represent the solution to the problem in a way that leads to, for example, less memory usage at runtime. Another point of view here is refactoring, which can lead to programs that are easier to maintain.

Given you have to read some code, which of the three is the hardest to figure out?

Given you know the representation, can you “backtrack” the problem and the solution?

These last two are about code readability. When we program, we encode a solution into a representation. When we read a program, we do the reverse process: we decode the representation back into a solution.

The problem with “readability” is that it’s a terrible word. What we want is understandability, reasonability and related benefits. I believe the word “readability” causes issues for people and leads to significant misunderstanding. There is a lot of “code like english prose” out there that does more harm than good by hiding important aspects of the representation being used.

Answers…

No, I don’t really have the answers. But we can still think about the questions and come up with interesting and useful conclusions on how to become better programmers.