2023-01-20

spwnn - embarassing

Spwnn is embarrassingly parallel.  Ooh - Wikipedia says I can call it "delightfully parallel."  I like it.

The spwnn program has zero critical sections (mutexes), but the command line version has one, around managing a list of dictionaries.  Each thread gets its own dictionary which eliminates all access contention.  

Except ... the code runs on a computer, and computers have limited resources, so the contention, if there is any, will be visible in a lower part of the stack - the OS or the hardware.

The OS mostly doesn't get in the way of my code - I have run the code long enough that eventually some bookkeeping task fires up and interrupts my code.  But it's rare.

Ah, but the hardware!  There are two places where contention is visible:  on bullshit hyperthreaded machines, the hyperthreads contend with everything, because they are shit.  And then on computers with real cores only, there could be contention when it comes to memory access.

Let's start by measuring perf on the laptop I took on my programming retreat.



Time        Thread Count 30.1493114 1 16.2259312 2 14.4654877 3 13.2778618 4 13.3141502 5 13.3918478 6 13.4732965 7 13.5409872 8

Each row is the time in seconds of a separate execution of the spwnn program with GOMAXPROCS set to a different number. If you don't set GOMAXPROCS, Go will use the core count (including evil hyperthreads) as the default. Luckily, it is super-easy to override.

The laptop has two real cores, and two phony-baloney cores. As you can see, using two threads cuts the execution time just about in half. Setting the thread count to 3 should take 1/3 the time if these were real cores, but as you can see, it makes a small difference, but not much. From then on the execution times are almost identical.

What's great about the Go scheduler is that it is crazy efficient - piling on extra threads (threads 5 through 8) that share the same hardware does not slow the code down!


Nice.


OK, enough about the hyperthreads.


The other place you can get contention is memory access. Sometimes a machine will have more cores than the memory buss can support. Way back in about 2010 I was testing a 32-[real]core (4x8) Intel behemoth and it could only keep up to about 26 cores busy before the memory saturated. Since these beasts were expensive, it was far better to buy 3 12-core (2x6) machines. You got 36 real threads and plenty of memory bandwidth for way less cost. w00t.


In the next part we'll look at some machines that do a delightful job of parallel execution of spwnn.






No comments:

Post a Comment