Wednesday, January 30, 2008

while(exception) { doSomething(); }

Xerces-C thows EntOfEntity to parse XML that contains "XML entities". So, if you use them in your XML (and you do, don't you?) there are many exceptions thrown and caught under the hood. Or nur "under the hood" -- in MS Dev Studio's debugger you get a popup.

It is widely accepted that it is a bad idea to use exceptions as a regular tool for the control flow in C++ programs
  • installing exceptions (having a `try-except`-clause in you program) is usually fast, it takes near-to-no time. If no exception is thrown it is probably just an additional pointer pushed on a stack.
  • but if an exception is thrown there is a lot of work involved. (Want details? let me know.)
  • therefore throwing exceptions on aregular basis is time consuming and hurts performance.
  • to figure out the control flow through throw-catch statements is also not an easy task and there is probably -- no, surely -- a better way to do it.

C'ish Local Variables

In the olden days it was necessary to declare the variables you plan to use somewhere in your code at the top of a function. Nowadays... you might think this is still the case, judging from what one encounters in current C++ Code very often.

ThisObject::ThisObject()
: m_thingy()
{
// a) declare a POD
int iRet;
// b) declare a POD-Array
char szError[BUFLEN];
// c) declare an object instance
std::string sError;

// Step 1: pre initialization
initializeForeignLibrary();

// takes a c-string with buf-length
if(getForeignError(szError, sizeof(szError)))
// takes a c-string
throw MyException(szError);

// Step 2: this objects own stuff
iRet = m_thingy.init();
if (iRet != 0)
{
// returns a std::string
sError = m_thingy.getError();
// copy into c-string
strcpy(szError, sError.c_str());
// ...because takes a c-string
throw MyException(szError);
}
}
Note that this is quite nicely working code. Nothing really wrong with it. But...

...it has a lot of C-ish flavor. The variables are declared at the top. And at most of them will not be used under all circumstances. C++ allows you to declare variables where they are used for the first time. You could write

const int iRet = m_thingy.init();
if (iRet != 0)
{
const std::string& sError = m_thingy.getError();
char szError[BUFLEN]; // see below
strcpy(szError, sError.c_str());
throw MyException(szError);
}
which is slightly better from a couple of viewpoints. Declaring variables as late as you can often gives the compiler a better chance to recognize which scope a variable is used for. Also, like in the example, you can make the variable ´const` and hence give the compiler even more hints towards optimization -- and take a step towards protecting yourself against some programming errors. Note that the `string` also has become a reference, which might be feasible depending on the return value of `getError()` and even save you a complete string copy now.

Pre-Guideline: Declare variables as late as you can.
Is it noteworthy to say that declaring a variable does not eat up any run-time? No, it is not! Although it is true, writing `int iRet;` at the top or in the middle of the code does not use up any CPU cycles, `std::string sMessage;` does. If you want to dive deeper into this, feel free. Suffice to say that this is a slight C++ inconsistency between using PODs ("Plain Old Datatypes") and C++ classes. For PODs no initializer is needed to be called -- to be compatible with C -- therefore `iRet` is born initialized, undefined, dirty, ...dangerous! Not so `sMessage` which is an object and thus initialized by having its standard constructor called, namely `string::string()`. Confusing? Yes.

Anyway, you don't need to know this, because you will not use it anyway! You will initialize your PODs. You will initialize your PODs. *brainwash*

When your code grows older (with wine one would say "matures", with code one says "gets longer") a person caring for that code (you, a fellow programmer, and/or a someone you like) a empty declaration might scroll off the screen! Dangerous! Maybe someone used that hanging around `int`, maybe someone is writing value, maybe someone is using it in an `if`, any maybe someone is compiling it again. All will work out nicely, because the programmer will test the code -- all the Unit tests he can think of, all the Regression test that happen to be there -- and all will be fine. Another year passes and the code gets a bit longer again. Or the compiler changes. Or the machine. Now it breaks! Damn, and you (our your successor) will spend hours and days to track down the uninitialized variable used in in that damn ìf`. This because the compiler might put a well defined value into the memory space of the variable, and the `if` worked on that -- but "well defined" only with exaclty these lines of code, on that machine, with this compiler, using exactly these compiler switches. At a different point in time the neutrons scatter differently and you have to find them!

Guideline: Initialize your PODs!
This especially means

 char szError[BUFLEN] = {0};
'nuff said. No, not really. Interestingly the C++ programmer will not notice any un-common-ness in it, because he uses this a lot, all the day. The C programmer will not notice but he will say "and what's that good for"? Ah, and therein lies the speciality!

The C++ specification says, that this is an "initialization of an array of PODs". If you would leave
out the assignment altogether you would only have a declaration -- which would keep the array dirty. But PODs are initialized with the assignment, and so the variables gets that. But only one "0"? Yes, because the standard says, if there anre not enough initializing values for the array of PODs, the remaining are initialized with 0. *sigh*... and all is well.

No rule without exceptions. Or at least a "but": The two guidelines together can do a lot for you. Initializing POD variables costs near-to-nothing and is always a good idea. With objects you have to keep in mind that initializing them might not be free -- and writing code that is planned to "mature" you'll never know today what happens in the future. So, keep in mind that object initialization is not free -- place your object declaration as near to the place where it is needed as possible, but as far out of loops as you can efford.

Guideline: Declare variables as late as you can. But as early as performance dictates.

Initializing an object inside a loop needs a constructor call at every pass. Maybe it a better idea to declare it (and initialize it!) at an outer level and just assign it inside the loop. In many cases this might almost be the same performance, but often it does not. With refcounting `string` implementations you will probably not noticing any difference, but with most other classes you might. References? Good idea -- tricky things, sometimes, wee look into that at another time.