Data Quality – Will we ever get this right?

Last week we heard about Apple vs FBI in the fight over a locked iPhone containing presumably valuable data regarding the San Bernardino attackers who killed 14 innocent people. Last night we heard about a gunman who randomly shot people in Kalamazoo, MI who happens to be a driver for Uber. There is no direct connection between the topic of this blog and these two incidences though some indirect link exists and I will leave it to your imagination.

Regardless of our individual positions on Apple’s stand, I would be curious to know what they find in the iPhone that they cannot find elsewhere. In this so well connected and cloud driven world where every vendor seem to want you to sync all of your information with their cloud services, you must be pretty deliberate and careful about not syncing your data with other cloud based systems. A bigger question I have is, with such vast amounts of available data and sophisticated analysis tools, what prevented law enforcement from picking up something like this? Impure data? Inconclusive evidence?

With respect to the Uber driver who allegedly shot several random bystanders, apparently he passed the background check by Uber. This goes to show that data alone (such as the background check at the time of hire) on a person can pick up everything, including whatever it is that drove the person to do such a horrible thing.

During a couple of recent occurrences at the College, it became clear yet again that we face some serious data quality issues. I am proud to say that we have a better control over this in certain areas such as student information than the others. These issues come to the fore when we are dealing with some very high profile events, such as the announcement of our new president.

In a highly distributed environment like ours, where the maintenance of data is distributed and quality control can be challenging. In many cases, the underlying data, such as a valid email address may be accurate, but other meta data such as, “has this person chosen not to receive emails from the College” gets very tricky. For example, when an alumna chose this option sometime in the past, was she aware that this really meant “I choose not to receive ANY email from the College”? Should the College remind the constituents to review these settings on a regular basis because circumstances change (please note that we can reach the constituents in other ways than email)?

Should we actually have more granular set of questions when it comes to communications – such as “I would like to be receive only general announcement emails about the College” or “I would like to know about online course offerings from the College” etc. If so, who gets to decide what choices should we provide that are manageable (the governance question)?

Regardless of all of what the technologies can do, the quality of the data is in the eyes of the constituents. For example, when they do not get an email about the new president, the first question is “why did I not get it?” If we do our research and find out that it is either because we have the wrong email address for the person or that the person has chosen not to receive any emails, in the eyes of the constituent these generally don’t make sense. From their perspective, “I didn’t know where to go to update my email address” or “No one told me that my email address is wrong” or “Oh, at some point the College was sending me so many irrelevant emails that I said don’t bother me any more. I didn’t realize that it meant that I won’t receive anything”.

In each of these cases, in the interest of keeping our relationship strong with the constituents, we should be the ones thinking about how to solve some of these issues. It will require us to build easy to use systems which allow for easy updates to automatically collecting current information of relevance to friendly reminders to the users to verify their information.

And that requires a lot of work, discipline, alignment (my new mantra) on the part of various departments, and good governance. You know the scoop!

Even the mightiest of corporations don’t seem to get this right. For whatever reason, some companies want to sell me cigarettes though I have never smoked in my life (all the existing data will prove this) and have no intention to do so. Or Stop & Shop sending me coupons for Coke and Pepsi despite the fact that I hardly buy soda and even when I do, I only buy diet coke.

It is so easy to collect huge amounts of data, but transforming them into good quality data is a monumental task that we have not yet mastered. This is truly something that will be “in the works” for a long time to come.

Ravi's Blog

Data Quality – Will we ever get this right?

Leave a Reply Cancel reply

Subscribe By Email