2012-03-29

Gadgeteer timing concerns

Since my daily living involves image processing, it tickled my fancy to attempt to run some image processing algorithms on the camera output. Since attaching camera, and displaying its results on the screen are so easy, I was looking for a bit more challenge. My first test was going to involve simple thresholding. With that in mind, I wanted to compute an average colour value for the pixels in the image.

Upon executing my simple, double for-loop code, iterating over 320 colums and 240 rows, I had to wait for--what at the time seemed like--an astonishingly long time. In fact, when I first executed the algorithm, I thought that my board froze.

But it didn't. I was aware, on a theoretical level, that NETMF is a wholly interpreted language (unlike its bigger cousins). But I was not ready for the performance penalty this entailed. Becoming curious about how long it takes to execute some basic operations, I devised some time testing code. The essence of it looks like:

// Simple for loop

start = Microsoft.SPOT.Hardware.Utility.GetMachineTime();
for ( x = 0; x < iterations; x++ )
{
    // emtpy loop
}
data.SimpleForLoop = Microsoft.SPOT.Hardware.Utility.GetMachineTime().Ticks - start.Ticks;
data.SimpleForLoop /= iterations; // get single iteration value in ticks
data.SimpleForLoop /= 10; // get single iteration value in microseconds
accumulator.SimpleForLoop += data.SimpleForLoop;
data.SimpleForLoop = accumulator.SimpleForLoop / additions; // average over multiple runs

This basic empty loop test gave me a baseline for the overhead for all additional testing. Curiously enough, 'iteration' is declared as a 'const int' rather than a variable. As a result the speed of the loop was comparable to hard-coding a constant integer value, whereas a variable added approximately 30-40% extra time. The following table lists the results of the tests, in microseconds (rounded down):

Task Time
Simple for loop: 54
Assignment of const ( y = 5 ): 7
Assignment of var ( y = z ): 11
Multiply constants ( 3 * 5 ): 0
Multiply with var ( 3 * z ): 10
Compare two constants ( 3 > 5 ): 11
Compare with var ( y > 5 ): 41
Compare 2 vars ( y > z ): 45

Now, the interesting thing about these results is, that they are created with a 72MHz processor. That means that a simple empty for-loop iteration takes approximately 4000 cycles. Pretty hefty overhead if you need any sort of time relevant operation, or need to process data in excess of a few hundred elements.

As it turns out, not all is lost! In their infinite wisdom, GHI created a solution aptly named RLP (which stands for Runtime Loadable Procedures). In a nutshell, you can load a pre-compiled native code procedure into memory, at run-time, and execute it from NETMF environment. I'll let the coolness of that sink in for a second.

Generation of these pre-compiled procedures isn't very straight forward, but if you follow the instruction steps as found on GHI's website (there are a few How-To's available), you'll be running in no time. In fact, my next post may just demonstrate some neat image processing as performed on a Gadgeteer FEZ Spider board.

No comments:

Post a Comment