Welcome to WordPress.com. This is your first post. Edit or delete it and start blogging!
Part 1 of:
The Purple Cow approach to Data Quality
Or: How to have fun while trying trying to jump data quality sponsorship hurdles
Or: how to use innovative communication tactics to reach your Data Quality objectives.
‘Communivate’ is a combination of the words communicate and innovate, and it means to communicate in an innovative way. Our team uses it a lot to describe how we get our message across. We are one of those insane teams (aka Sneezers ) who constantly push the boundaries of ‘appropriate’ tactics to get the job done, and are always coming up with new terms to describe our approaches. (Makes me wonder if coining new terms is a DQ thing?)
I am responsible for implementing a Data Quality (DQ) program and I have no business sponsor. As a result, my team and I put an enormous amount of effort into achieving the following:
· Raising awareness
· Communicating poor DQ issues
· Stalking (did I say stalking?) I meant to say identifying and engaging business stakeholders
· Developing business cases to educate business and IT on best practices
· Getting buy-in
Essentially, we collect a lot of data and share it with whoever will listen. And because we don’t have that essential business sponsor, we need to communicate over and over (and over) the same messages to various stakeholders. It can get tiresome [insert shot of Dracula sucking enthusiasm out of lifeless body here] after a while…
So, here is one example of how we communivate,
Goal 1: Raise Awareness
The Strategy? Find a Captive Audience.
Since we don’t have a business sponsor we don’t have the same corporate tools to spread the word. Internal intranets, team portals and corporate newsletters are all off limits, so instead we took the message to the people. Because we’re sneezers, we wanted to push the boundaries and have fun. The team printed off screenshots of seriously bad data and posted them (under covert secrecy – more sneezer fun) on the doors of washroom stalls. You could not get a more captive audience than that.
The results? We ran this campaign 4 times over a 1 year period and by the end of the year our communivative strategy AND the message we were trying to achieve was mentioned by a Senior VP in a corporate communication, we received 57 positive (and I tree hugging negative) comments and another Senior VP asked us when our new campaign was going to start. (Ok, they still are not sponsoring us but they do like us, so one thing at a time..).
Awareness raised, goal achieved.
First, let me apologize for the delay in getting this post out. The team was busy working on a big data clean up project that took most of the summer and a lot of resources. The good news is that the effort provided enough statistics to make a business case for process and system changes to improve the data going forward. One small step at a time!
In order to share what types of resources I believe are important for a successful data quality program, I’ll need to provide a brief overview of what makes up our Data Quality Program.
Our program is based on recommendations that came out of an initial assessment of the maturity level of our data quality and include the following:
This is the most important part of a data quality program. It allows you to see where your data problems are AND track the progress of any improvements. Once you start measuring the quality of your data you should continue to measure it on an on-going basis.
You’ll need an analyst with a strong analytical mind. I use the term analyst only because data quality resource profiles are so new, so analyst can also refer to business analyst or data analyst or information analyst. They need to be someone who likes to get the bottom of things and who sees every problem as a challenge to be overcome. They also need to be good at developing and writing business cases, as the results of your data profiling generally lead to a case for making changes to a process or system. They should be an excellent communicator (ok, everyone should be an excellent communicator!) and have good influencing skills…they need this to get the data extracts that some groups don’t like to give up 🙂
Data definition is a simple idea: define your data so that creators, users and stakeholders understand the meaning and purpose of the data. Remember the definition of data quality: “the data is in a state fit for it’s intended purpose”? And although the logic behind defining you data is simple, the effort to do this is sometimes the most challenging and time consuming. Have you ever tried to get everyone to agree to a definiton of a ‘customer’? Not so easy. And yet without a definition, you have nothing to work towords in a data quality program. You simply cannot understand, improve, consolidate or convert the data without understanding the meaning.
How did we do this? We started with the data that most concerns everyone and that is the basic customer information: Name, address, segmentation, etc. We obtained the definitions from the data warehouse and we put them all together in a word document, posted it on a shared directory and sent links to everyone who we thought would be interested in this information. The results were as follows:
-People contacted us to advise us that a definition was out of date – good! We ensured the definition was updated.
-Business Analysts began to refer to the definitions as part of their requirements documentation – good!
-Business users began to refer to the definitons when discussing potential changes – good!
-New employees used the information as part of their training – good!
We then purchased a simply wiki tool, added the definitons and published this information on the corporate web site. That was in late 2007. Today we have a corporate wiki which contains over 1000 terms (definitions, business rules and purpose) and over 60 articles and help guides, it gets on average over 800 hits per month and has 16 contributers. The contributers are business users of the information who care about the accuracy and have agreed to particpate in the upkeep of the information – they are our future Data Stewards.
Our goal – to have every corporate term published in the wiki and have the information managed by stewards.
We have 2 types of resources to manage this function:
One is an excellent writer and editor – they should be excellent communicators and should write from a business perspective. Why? Because you need to use simple, everyday language that everyone can understand. For example, if your term is called ‘blue sky’, your definition or business rule information should say “why the sky is blue” rather than “diffuse sky radiation and its impact on colour perception”. Their main responsibility is to review the information provided or contributed by others and ensure it is clear, easy to understand and formatted correctly. Kind of like a wiki editor.
The second is a business analyst with really good (and I mean REALLY good) sales skills. It’s their job to extract the information out of people’s heads, emails, documents, user guides, folders and systems. They also need to get users of the information to become contributers – a big change for non-social media types.
Those are the main functions of our program – and the most important. We also implement data quality improvement projects (see my apology at the top of the page), provide data quality related support, develop help guides and tips and tricks and perform manual and automated cleansing and enrichment using a data management software tool. For the next post, and I promise not to leave it so long, I’ll identify some of our biggest challenges with not having executive sponsorship and how we overcame them using some purple cow techniques 🙂
1/ Our program is based on Industry Best Practices
2/ Our methods for communicating and engaging others are very ‘Purple Cow’ – “In his book Purple Cow: Transform Your Business by Being Remarkable, Seth Godin says that the key to success is to find a way to stand out–to be the purple cow in a field of monochrome Holsteins.
The title: ‘Data Quality – From the Ground Up’, is just that. Implementing a successful Data Quality Program without an executive sponsor or a mandate starting from the ground up CAN work and the goal of writing this is to share these strategies in the hopes that what has worked for us will help others achieve the same success.
Getting Started – The Basics
This week I’ll start with the basics; those industry best practices that are logical and do-able.
1/ Identify the important data
For us it was Customer type data and we started with the basics; name, address, city, province/state, country, phone, fax, email, website.
2/ Profile the important data
Data profiling is just another word for data analysis. Get an extract of the data and start with the most basic analysis. Is it complete? How much of it is blank?
3/ Find someone who cares
For us it meant someone in IT, as IT has known about data quality issues for a long time. Better, would be someone on the business side, but take what you can get.
4/ Communicate your results
Find a way, any way to communicate what you’ve found. Post the results on your intranet, send them in an email (interesting profiling results tend to get forwarded), or post them by the printers or water cooler. More information coming later on some of the ‘purple cow’ methods that worked for us.
5/ Define the data
Start gathering and documenting the basic definitions and business rules for the important data and share it. You would not believe how people will thank you.
This was our approach and we only used some basic Microsoft tools like excel for the profiling and word for the definitions. Today we have an enterprise data management tool for the profiling and a wiki with over 900 corporate definitions. Not bad for 2 1/2 years.
There are a lot of other things that can and need to be done as well, but the methods described above is a good start and you can’t go wrong.
In the coming blogs I’ll talk about the results of each of the above and explain how we logically progressed to where we are today. I’ll also share some fun ‘purple cow’ stuff that we’ve done with amazing results.
Next Week: What did we find when we profiled our important data and what did we do about it?