Thursday, December 28, 2006

Know Thy Data


Data is often thought of as numbers, letters, binary setting - Wikipedia has an in depth interesting description. Data is mapped, reported on, used, converted, imported and transformed.

What most people (IT Professionals - I'm assuming most are people) forget is what the data is and why it is and because of this systems are often incorrectly constructed. The biggest issues that I have seen in system development are due to a lack of understanding of the information (aka data) being manipulated. Programmers often see data in terms of counts, types, relationships, uses - what they don't see is what the data represents and the value of it. A programmer will mangle thousands of records normalizing and de-normalizing as needed to get the best performance with a generic view of the data. What MOST lack is the understanding of what the data is, they see it as letters and numbers, could be a '1' could be a 'z', all the need to know is if it's part of a field being used in a index or a field that will be updated or calculated. What should they care..right?

Well, let's first start with a very basic understanding that isn't understood by many, DATA is often the bulk of the value of any system, not the program code, not the server, not even the developer's time. Without data what value would a customer database have? What value would a inventory system have? NONE! An empty inventory system is a cost, one that is full of data, reflecting a businesses stock is priceless. Keep that in perspective.

Let's move on.

If a programmer (DBA, DBD or whatever you would like to be called) does not understand the data they are manipulating, how could they effectively design, store or program against it? A two char field? What could it be? A state? A prefix? A medical code? Better know it. Let's assume it's a state, let's normalize it add a state table and then associate it with a numeric key for better normalization - right? Yep, you never know when another state will be added or removed, happens all the time, could be 2 or could be 1,000 of them in the database. What if the data could only reflect information for up to 2 states? The only states that the company does business in and perhaps the company might grow into another state within 3-5 years. Still makes sense to separate the state out into it's own table? What about all that SQL code you built to handle the generic possibilities? Better optimization if you knew there could be only 2 or at most 3 in the next 3-5 years? MOST LIKELY. Many programmers will retort that they should not be focused on what the data means because generic solutions are the best - that's just BS - generic solutions are the simplest. Do what you're trained to do and design systems based on the elements that bring the most value - in most cases the DATA.

No comments:

Post a Comment