Overview
In my past post, I had talked a bit about the importance of data cleanup. I wanted to expand this a little in my post this time. Can you imagine a world of high stakes like healthcare or financial services wherein the data isn’t clean and we make decisions based on that data? We see that AI isn’t perfect even we are trying to polish a blog post! So, that’s why we have to review it when posting it.
User-Data validation
Few more points other than stating the obvious - 1. Look at your
older customers (i.e. who have been with you over say 5 years. Older
doesn’t mean older by age.). If you have created new features for new
users then you ’d have to determine if they apply for the older ones.
Having null values for old customers for the required mew fields is good
example. 2. Some customers are listed as active when they are inactive.
Do you have processes in place to update this? 3. Many data migrations
are LONGGGG overdue to be run. This is to correct data as a bulk
operation. Let’s say you have an insurance business and you received
claims today. Unfortunately, they weren’t processed due to technical
issues. We need to correct these claims in bulk. Many processes like
these are skipped due to which we receive customer complaints. 4. In big
companies, we have also seen duplicate copies of data and so the
inconsistency of data in different pages causes issues to uses. 5.
Customer needs change over time and I have seenthem use existing fields
for other purposes than what they were actually meant for. For instance,
this one time, the customer wanted a new category of users like
identifying their vendor users. Instead of requesting a new type to us,
they added it to the vendor name eg: “
Category/Company based analytics
When the data isn’t correct at the user level, then the category / company level analytics isn’t correct. For instance, let’s say you had 50 users buy “Feraro Rocher” chocolates in the last 7 days but the database shows only 30 then your company analytics are incorrect in the last 7 days! The same logic applies for the other use cases.
Final Thoughts
Whether you use Google Sheets/ SQL server etc, you cannot have the correct analytics when the data at the user level/individual level is incorrect! Data cleanup is required regularly therefore to have proper analytics.