Data integrity and privacy

One of the major concerns I have with regard to privacy is data integrity. With data integrity, I mean that the correctness of the data used. Now, today much of the data is used for serving you ads, but plenty of startups are working on using the data for other purposes like issuing loans or offering insurance. For instance, imaginary startup X works on an algorithm to determine house valuations. This software is sold to various lenders and the lenders use it to determine to approve mortgages and rates. The whole algorithm is data-driven. It takes various data into account likes past sales in the area, household incomes of the neighborhood, crime rates, quality of the school system and the list goes on. The benefit is that the lender can determine more or less without “bias” (this is a topic of whole different post – not this one) what the valuation of the house is. But then startup X made an error. Somehow it is applying the wrong crime rate data on this particular zip code. It is not a major error, it just skews the crime rate with a couple of percent points. Enough though to lower the valuation of the property by 15%. Nobody recognizes this problem because software is used by most of the lenders and everyone comes to the same wrong conclusion. Obviously, this is an imaginary example, but very real risk.

There are a couple of reasons why data integrity is potentially a big problem:

  • It is virtually impossible to correct
  • It is mostly invisible

There are a few ways to solve this. First is in transparency, not every companies favorite topic, but it does help to understand the parameters which determined the outcome. This helps because it allows for auditing which is another way of solving this problem. We’re used to audit financial results of public companies to verify the reporting of a company to the shareholders who do not have time, resources and legal rights (think about that for a minute) to verify the results. Transparency enables auditing and auditing by itself increases trust. The last solution is multiple disparate systems. This is the world we live in today. But like any industry, everything consolidates to a few suppliers like credit rating agencies. This is the reason we haven’t seen this yet.

I am confident though we will see this problem crop up in the future where people get denied services because of incorrect data. It’s unfair and without transparency and auditing rights to the data, it is going to be tough to correct and identify the problem.

This is also one of the reasons why privacy and data ownership is so important. If we do not own our own data and do not have a properly enforced rights to privacy, we cannot control where our data ends up and how it’s being used. You can talk about your fear of dying because your father just passed away because of cancer and your text messages end up in someone’s database mislabeled as “cancer patient”. It may sound farfetched today, but without any rights and restrictions today this a world where heading into.