2012-03-30

First foray into RLP

Got around to speed testing the RLP code today. The C/C++ code used for testing computes average value for each of the colour channels, then combines them back into a single value.
int AverageColour_c(unsigned char *generalArray, void **args, unsigned int argsCount, unsigned int *argSize)
{
 // total number of pixels pass onto us
 unsigned int arraySize = *(unsigned int*)args[0];

 int i = 0, red = 0, green = 0, blue = 0;
 if( generalArray == 0)
  return -1;
 
 // iterate through colour values
 for(i = 54; i < arraySize;)
 {
  red += generalArray[i++];
  green += generalArray[i++];
  blue += generalArray[i++];
 }

 // extract how many pixels were there in total
 int total = (arraySize - 54) / 3;

 // compute average value for each colour
 red /= total;
 green /= total;
 blue /= total;

 // convert into final Color value
 total = (blue << 16) | (green << 8) | red;

 return total;
}
It assumes that the data passed into it is Gadgeteer.Picture.PictureData byte array. This byte array has colour data starting at the 54th byte location and stores colour data as BGRBGRBGR bytes.

The total running time of the aforementioned RLP call is 214ms. The exact same code, encapsulated in a NETMF function call runs for a total of 16335ms. In other words, approximately 76 times faster. For a very simple piece of code.

The NET side of the code is pretty simple in itself:

RLP.Procedure AverageColour;

// This method is run when the mainboard is powered up or reset.   
void ProgramStarted()
{
    // Use Debug.Print to show messages in Visual Studio's "Output" window during debugging.
    Debug.Print( "Program Started" );

    InitializeRLP();

    // setup Gadgeteer modules
    button.ButtonReleased += new Button.ButtonEventHandler( button_ButtonReleased );
    camera.PictureCaptured += new Camera.PictureCapturedEventHandler( camera_PictureCaptured );
    camera.DebugPrintEnabled = false;
    camera.CurrentPictureResolution = Camera.PictureResolution.Resolution320x240;
}

void InitializeRLP()
{
    // personal unlock code
    RLP.Unlock( "...", new byte[] { ... } );

    // fetch, load and initialize our RLP
    byte[] elf_file = Resources.GetBytes( Resources.BinaryResources.RLP_test );
    RLP.LoadELF( elf_file );
    RLP.InitializeBSSRegion( elf_file );

    // extract the procedures
    AverageColour = RLP.GetProcedure( elf_file, "AverageColour_c" );

    // dispose of the loaded binary data we no longer need
    elf_file = null;
    Debug.GC( true );
}

void camera_PictureCaptured( Camera sender, GT.Picture picture )
{
    const int pixels = 320 * 240;

    int colour = AverageColour.InvokeEx( picture.PictureData, 54 + pixels * 3 );
    Color average = (Color)colour;
    Debug.Print( "Average colour: Red - " +
                 ColorUtility.GetRValue( average ) +
                 " Green - " +
                 ColorUtility.GetGValue( average ) +
                 " Blue - " +
                 ColorUtility.GetBValue( average ) );
}

void button_ButtonReleased( Button sender, Button.ButtonState state )
{
    camera.TakePicture();
}

2012-03-29

Gadgeteer timing concerns

Since my daily living involves image processing, it tickled my fancy to attempt to run some image processing algorithms on the camera output. Since attaching camera, and displaying its results on the screen are so easy, I was looking for a bit more challenge. My first test was going to involve simple thresholding. With that in mind, I wanted to compute an average colour value for the pixels in the image.

Upon executing my simple, double for-loop code, iterating over 320 colums and 240 rows, I had to wait for--what at the time seemed like--an astonishingly long time. In fact, when I first executed the algorithm, I thought that my board froze.

But it didn't. I was aware, on a theoretical level, that NETMF is a wholly interpreted language (unlike its bigger cousins). But I was not ready for the performance penalty this entailed. Becoming curious about how long it takes to execute some basic operations, I devised some time testing code. The essence of it looks like:

// Simple for loop

start = Microsoft.SPOT.Hardware.Utility.GetMachineTime();
for ( x = 0; x < iterations; x++ )
{
    // emtpy loop
}
data.SimpleForLoop = Microsoft.SPOT.Hardware.Utility.GetMachineTime().Ticks - start.Ticks;
data.SimpleForLoop /= iterations; // get single iteration value in ticks
data.SimpleForLoop /= 10; // get single iteration value in microseconds
accumulator.SimpleForLoop += data.SimpleForLoop;
data.SimpleForLoop = accumulator.SimpleForLoop / additions; // average over multiple runs

This basic empty loop test gave me a baseline for the overhead for all additional testing. Curiously enough, 'iteration' is declared as a 'const int' rather than a variable. As a result the speed of the loop was comparable to hard-coding a constant integer value, whereas a variable added approximately 30-40% extra time. The following table lists the results of the tests, in microseconds (rounded down):

Task Time
Simple for loop: 54
Assignment of const ( y = 5 ): 7
Assignment of var ( y = z ): 11
Multiply constants ( 3 * 5 ): 0
Multiply with var ( 3 * z ): 10
Compare two constants ( 3 > 5 ): 11
Compare with var ( y > 5 ): 41
Compare 2 vars ( y > z ): 45

Now, the interesting thing about these results is, that they are created with a 72MHz processor. That means that a simple empty for-loop iteration takes approximately 4000 cycles. Pretty hefty overhead if you need any sort of time relevant operation, or need to process data in excess of a few hundred elements.

As it turns out, not all is lost! In their infinite wisdom, GHI created a solution aptly named RLP (which stands for Runtime Loadable Procedures). In a nutshell, you can load a pre-compiled native code procedure into memory, at run-time, and execute it from NETMF environment. I'll let the coolness of that sink in for a second.

Generation of these pre-compiled procedures isn't very straight forward, but if you follow the instruction steps as found on GHI's website (there are a few How-To's available), you'll be running in no time. In fact, my next post may just demonstrate some neat image processing as performed on a Gadgeteer FEZ Spider board.

Gadgeteer introduction

I recently learned of existence of Microsoft Gadgeteer initiative. On paper, it sounded very exciting. Having a practical project in mind, I acquired a FEZ Spider Starter Kit from GHI Electronics. After waiting a few weeks for the back-ordered components, the kit arrived without any problems.

After unpacking it, I followed easy instructions on the GHI's support page, after installing VS 2010 Express (at work), and VS 2010 Pro (at home). With minimal guidance, I set up the first project. Plugging in a button and an LED, it took 5-7 mouse clicks and 14 keyboard presses to end up with:

void ProgramStarted()
{
    button.ButtonPressed += new Button.ButtonEventHandler( button_ButtonPressed );
}

void button_ButtonPressed( Button sender, Button.ButtonState state )
{
    led.TurnWhite();
}

F6 to compile. F5 to deploy. 10-20 seconds later, pushing the button turns on the LED! Holy smokes. This is even more amazing stuff than I hoped.