I am familiar with many of the mechanisms and idioms surrounding concurrency in Java. Where I am confused is with a simple concept: concurrent access of different members of the same object.
I have a set of variables which can be accessed by two threads, in this case concerning graphical information within a game engine. I need to be able to modify the position of an object in one thread and read it off in another. The standard approach to this problem is to write the following code:
private int xpos;
private object xposAccess;
public int getXpos() {
int result;
synchronized (xposAccess) {
result = xpos;
}
return result;
}
public void setXpos(int xpos) {
synchronized (xposAccess) {
this.xpos = xpos;
}
}
However, I'm writing a real-time game engine, not a 20 questions application. I need things to work fast, especially when I access and modify them as often as I do the position of a graphical asset. I want to remove the synchronized overhead. Even better, I'd like to remove the function call overhead altogether.
private int xpos;
private int bufxpos;
...
public void finalize()
{
bufxpos = xpos;
...
}
Using locks, I can make the threads wait on each other, and then call finalize() while the object is neither being accessed nor modified. After this quick buffering step, both threads are free to act on the object, with one modifying/accessing xpos and one accessing bufxpos.
I have already had success using a similar method where the information was copied in to a second object, and each thread acted on a separate object. However, both members are still part of the same object in the above code, and some funny things begin to happen when both my threads access the object concurrently, even when acting on different members. Unpredictable behaviour, phantom graphical objects, random errors in screen position, etc. To verify that this was indeed a concurrency issue, I ran the code for both threads in a single thread, where it executed flawlessly.
I want performance above all else, and I am considering buffering the critical data in to separate objects. Are my errors caused by concurrent access of the same objects? Is there a better solution for concurrency?
EDIT: If you are doubting my valuation of performance, I should give you more context. My engine is written for Android, and I use it to draw hundreds or thousands of graphic assets. I have a single-threaded solution working, but I have seen a near doubling in performance since implementing the multi-threaded solution, despite the phantom concurrency issues and occasional uncaught exceptions.
EDIT: Thanks for the fantastic discussion about multi-threading performance. In the end, I was able to solve the problem by buffering the data while the worker threads were dormant, and then allowing them each their own set of data within the object to operate on.
Answer
If you are dealing with just individual primitives, such as AtomicInteger
, which has operations like compareAndSet
, are great. They are non-blocking and you can get a good deal of atomicity, and fall back to blocking locks when needed.
For atomically setting accessing variables or objects, you can leverage non-blocking locks, falling back to traditional locks.
However, the simplest step forward from where you are in your code is to use synchronized
but not with the implicit this
object, but with several different member objects, one per partition of members that need atomic access: synchronized(partition_2) { /* ... */ }
, synchronized(partition_1) { /* ... */ }
, etc. where you have members private Object partition1;
, private Object partition2;
etc.
However, if the members cannot be partitioned, then each operation must acquire more than one lock. If so, use the Lock
object linked earlier, but make sure that all operation acquires the locks it needs in some universal order, otherwise your code might deadlock.
Update: Perhaps it is genuinely not possible to increase the performance if even volatile
presents an unacceptable hit to performance. The fundamental underlying aspect, which you cannot work around, is that mutual exclusion necessarily implies a tradeoff with the substantial benefits of a memory hierarchy, i. e. caches. The fastest per-processor-core memory cache cannot hold variables that you are synchronizing. Processor registers are arguably the fastest "cache" and even if the processor is sophisticated enough to keep the closest caches consistent, it still precludes keeping values in registers. Hopefully this helps you see that it is a fundamental block to performance and there is no magic wand.
In case of mobile platforms, the platform is deliberately designed against letting arbitrary apps run as fast as possible, because of battery life concerns. It is not a priority to let any one app exhaust battery in a couple of hours.
Given the first factor, the best thing to do would be redesign your app so that it doesn't need as much mutual exclusion -- consider tracking x-pos inconsistently except if two objects come close to each other say within a 10x10 box. So you have locking on a coarse grid of 10x10 boxes and as long an object is within it you track position inconsistently. Not sure if that applies or makes sense for your app, but it is just an example to convey the spirit of an algorithm redesign rather than search for a faster synchronization method.
No comments:
Post a Comment