OGB Appreciation Day : How I failed to appreciate NULL
On the annual OGB Appreciation Day (#ThanksOGB has been renamed multiple times, see Tim Hall's blog post) I am going back in time to when I was a newbie developer, relating a story about me making a mistake that had bad consequences in the customer ERP system. Luckily I had experienced colleagues to repair the damage, and later in my career I learned about NULL and how it could have been so different back then.
My first job as developer (in the last half of the 1990's) was as consultant, customizing the danish ERP application called Concorde XAL for our customers.
It had it's own data dictionary and own programming language, which the XAL engine then transformed to SQL DDL and DML. There were engines for a simple native database, for DB2, for Sybase and for Oracle (MS Sql Server came later.) But whichever engine, I just programmed in XAL and the underlying database was a black box.
One of our customers needed the system to be customized, so that invoices for some of their customers were electronically transmitted via EDI. This called for me to add a new column to the invoice table to contain the EDI invoice number for those invoices. So in the XAL dictionary I added this new column and an index on the column as well. Since duplicates of such EDI invoice numbers were not allowed, I marked the index to be unique. (And I implemented all of the EDI code, but that is irrelevant to this story.)
All good and it looked like it worked - or so I thought...
The day after the customer said it looked weird - invoices seemed to be missing in the system? They had been created and matching posts in the ledger and customer data were there, but the invoices themselves were simply missing. Why so?
Well, what I hadn't considered was, that the black box XAL engine that created SQL DDL and DML did not want to work with NULLs. Every single column was created as NOT NULL and the engine used default values (0 for numbers, January 1st 1900 for dates, CHR(2) for strings) for all "empty" values.
As I had marked the index to be unique, that meant that only one (1) single invoice was allowed to have an empty EDI invoice number (which would be 0 instead of NULL.) So all of the invoices (except the first one) for customers that should not have electronic invoices, failed with a constraint error. Which even I as newbie would have discovered in my simple testing (well, obviously I hadn't tested enough), if it had not been for the fact that the XAL engine for some reason silently caught those exceptions and ignored them.
So the solution was to make the index non-unique. But a lot of invoices were missing from production system. All of the other tables (ledger, customer log, inventory posts, etc.) inserted/updated/deleted as part of the same transaction were OK, but the invoice itself just wasn't there.
That's where I appreciated experienced colleagues, who were able to re-create the data from the other tables. (The missing data was for a large part data that basically was a snapshot of other data at the time of invoicing, like for example customer address.)
So I was lucky to be helped, and it taught me a lesson.
Experienced colleagues can be very much appreciated, and so can peers in a developer community be very much appreciated. A newbie like I was then needed a helping hand, not a sarcastic put-down. I try to remember this now that I am (in many situations) the experienced guy. And I very much appreciate that this is happily something I have experienced quite a lot in the Oracle community. (Not 100 percent of the time, even we aren't perfect, but sufficiently often.)
Now what has this got to do with NULL, you ask?
Well, I happened to see a thread on Twitter the other day on how NULL works, and more than one comment was along the lines of: "NULL causes unnecessary pain and should be avoided" or "NULL got introduced because there was a need for it once. Since then programming languages evolved and we don't need NULL at all."
Looking back on my "failure" back then, I can see that in essence it was caused by the XAL engine developers having decided NULL should be avoided. If NULL had been allowed, my new column would only have real values for those invoices that truly had EDI numbers - other invoices would have NULL in that column. Which would have allowed my unique index to work - and even to be very efficient, only indexing the necessary rows (with the default values used, the index contained maybe 80% rows with the value 0 in this column - which would never be queried.)
The discussion about NULL should not be so black-or-white, it should be grey-scale. Yes, there are cases where NULL causes unnecessary pain. But yes, there are cases where avoiding NULL causes unnecessary pain too.
Think about NULL. Understand NULL. Use it when appropriate. Avoid it when not.
Appreciate it is there as a tool for you - like anything else in SQL.
#ThanksOGB
My first job as developer (in the last half of the 1990's) was as consultant, customizing the danish ERP application called Concorde XAL for our customers.
It had it's own data dictionary and own programming language, which the XAL engine then transformed to SQL DDL and DML. There were engines for a simple native database, for DB2, for Sybase and for Oracle (MS Sql Server came later.) But whichever engine, I just programmed in XAL and the underlying database was a black box.
One of our customers needed the system to be customized, so that invoices for some of their customers were electronically transmitted via EDI. This called for me to add a new column to the invoice table to contain the EDI invoice number for those invoices. So in the XAL dictionary I added this new column and an index on the column as well. Since duplicates of such EDI invoice numbers were not allowed, I marked the index to be unique. (And I implemented all of the EDI code, but that is irrelevant to this story.)
All good and it looked like it worked - or so I thought...
The day after the customer said it looked weird - invoices seemed to be missing in the system? They had been created and matching posts in the ledger and customer data were there, but the invoices themselves were simply missing. Why so?
Well, what I hadn't considered was, that the black box XAL engine that created SQL DDL and DML did not want to work with NULLs. Every single column was created as NOT NULL and the engine used default values (0 for numbers, January 1st 1900 for dates, CHR(2) for strings) for all "empty" values.
As I had marked the index to be unique, that meant that only one (1) single invoice was allowed to have an empty EDI invoice number (which would be 0 instead of NULL.) So all of the invoices (except the first one) for customers that should not have electronic invoices, failed with a constraint error. Which even I as newbie would have discovered in my simple testing (well, obviously I hadn't tested enough), if it had not been for the fact that the XAL engine for some reason silently caught those exceptions and ignored them.
So the solution was to make the index non-unique. But a lot of invoices were missing from production system. All of the other tables (ledger, customer log, inventory posts, etc.) inserted/updated/deleted as part of the same transaction were OK, but the invoice itself just wasn't there.
That's where I appreciated experienced colleagues, who were able to re-create the data from the other tables. (The missing data was for a large part data that basically was a snapshot of other data at the time of invoicing, like for example customer address.)
So I was lucky to be helped, and it taught me a lesson.
Experienced colleagues can be very much appreciated, and so can peers in a developer community be very much appreciated. A newbie like I was then needed a helping hand, not a sarcastic put-down. I try to remember this now that I am (in many situations) the experienced guy. And I very much appreciate that this is happily something I have experienced quite a lot in the Oracle community. (Not 100 percent of the time, even we aren't perfect, but sufficiently often.)
Now what has this got to do with NULL, you ask?
Well, I happened to see a thread on Twitter the other day on how NULL works, and more than one comment was along the lines of: "NULL causes unnecessary pain and should be avoided" or "NULL got introduced because there was a need for it once. Since then programming languages evolved and we don't need NULL at all."
Looking back on my "failure" back then, I can see that in essence it was caused by the XAL engine developers having decided NULL should be avoided. If NULL had been allowed, my new column would only have real values for those invoices that truly had EDI numbers - other invoices would have NULL in that column. Which would have allowed my unique index to work - and even to be very efficient, only indexing the necessary rows (with the default values used, the index contained maybe 80% rows with the value 0 in this column - which would never be queried.)
The discussion about NULL should not be so black-or-white, it should be grey-scale. Yes, there are cases where NULL causes unnecessary pain. But yes, there are cases where avoiding NULL causes unnecessary pain too.
Think about NULL. Understand NULL. Use it when appropriate. Avoid it when not.
Appreciate it is there as a tool for you - like anything else in SQL.
#ThanksOGB
Comments
Post a Comment