kplug at davidb.org
Sat Nov 28 09:12:27 PST 2009
On Sat, Nov 28, 2009 at 10:00:28AM +0000, SJS wrote:
>> compareAndSet is the real primitive here, the rest are just
>> implemented around that.
>I learned it as the M68k's Test and Set.
>First thing, you make a critical section. :)
But my point about the lock-free programming is that you _don't_ make
a critical section. The critical section is what kills you.
This isn't just a minor domain either. There are several core data
sturctures in the Linux kernel that are being used in a lock-free way.
>> Why's that? It allows for much richer coordination of threads,
>> entirely lock-free. For low-thread-count programs, it's usually
>> higher overhead. For high-thread-count programs it's a big win.
>It's slow, and I've yet to see a solution to the tortoise-and-the-hare
>You have two threads that take a value and update it after performing
>some computation on it. One thread is fast (the hare), the other slow
>(the tortoise). The tortoise will starve, as every time it finishes its
>computation and goes to update the value, it discovers that the value
>has changed, and it must start its computation over again.
Then you're using STM wrong. There shouldn't be computation inside of
the transactions. The transactions should be small and side-effect
free. Using lots of CPU to compute is a side-effect.
The STM protects the update of a data structure that holds the result,
not the complex computation.
>> With STM, just update both in the same transaction.
>...and pray that you don't get locked out by some other, faster, thread.
Again, if you're doing something "slow" inside of your transaction,
you're not doing STM right.
>> There are people who _can_ use Java if it has ForkJoin, otherwise they
>> have hardware with cores that would just sit idle while code slowly
>> grinds through problems.
>Good on them. Let them use C.
Because it's even harder to make large complex programs in C that work
>> Then you need to read up on ForkJoin, since that's the whole point.
>> We have to solve these problems with multiple cores, since the speedup
>> per core is slowing down.
>For most problems, the overhead eventually overwhelms the speedup.
What I've read is that it's the other way around. Only with large
enough data sets and enough parallelism do you get a gain from
computing it in parallel rather than the simple solution.
about using ForkJoin to search an array of 500k items for the largest.
The algorithm divided into certain sized pieces that each core would
search sequentially. For 8-cores, they got the best results with
around 5000 searched sequentially.
The other subtlety is that running the same code on multiple cores can
help with cache/memory performance. However, writes to reads between
the cores will cause cache flushes. In other words, the large array
search is probably a best example.
More information about the KPLUG-List