Chat with us, powered by LiveChat This week's assigned reading discusses the organizational issues associated with 'dirty data' and offers recommendations for four ways to overcome common data prep barriers. For this | Wridemy

This week’s assigned reading discusses the organizational issues associated with ‘dirty data’ and offers recommendations for four ways to overcome common data prep barriers. For this

 

This week's assigned reading discusses the organizational issues associated with "dirty data" and offers recommendations for four ways to overcome common data prep barriers.

For this discussion, assume you work for a small company (100 employees) that is quickly growing (both in terms of revenue and number of employees). As the organization is growing, issues with "dirty data" are beginning to arise – for all the reasons mentioned in the reading.

As a tech-savvy data analyst, you can see that something must be done to keep dirty data from becoming a problem for the company as it grows. You've been granted an audience with the CIO to make the case for putting sound data preparation practices in place. In making the case, you want to explain why such practices are important, and also offer some recommendations for next steps. What will you say? Use the reading to help you make your case. Outside references are also acceptable (please link to them in your response).

Dirty data is costing you 4 ways to overcome common data prep barriers

2Dirty data is costing you: 4 ways to overcome common data prep barriers

If you’ve ever analyzed data, you know the pain of digging into your data only to find that the data is

poorly structured, full of inaccuracies, or just plain incomplete. You’re stuck adapting the data in Excel or

writing complex calculations before you can answer a simple question.

Data preparation is the process of getting data ready for analysis, including data discovery,

transformation, and cleaning tasks—and it’s a crucial part of the analytics workflow. A recent Harvard

Business Review study reports that people spend 80% of their time prepping data, and only 20% of their

time analyzing it. And this statistic isn’t restricted to the role of the data stewards. Data prep tasks have

bled into the work of analysts and even non-technical business users.

Even those who aren’t directly performing data preparation tasks feel the impact of dirty data. The

amount of time and energy it takes to go from disjointed data to actionable insights leads to inefficient

ad-hoc analyses and declining trust in organizational data. These slower processes can ultimately lead to

missed opportunities and lost revenue. In fact, Gartner research shows that the “average financial impact

of poor data quality on organizations is $9.7 million per year.”*

Why dirty data happens …………………………………………………………………………………………………………….3

Issue 1: Rigid and time-consuming processes don’t keep up with demand ………………………………….4

Issue 2: Data preparation requires deep knowledge of organizational data …………………………………6

Issue 3: “Clean data” is a matter of perspective …………………………………………………………………………8

Issue 4: The hidden reality of data prep silos …………………………………………………………………………… 10

Table of contents

Introduction

3Dirty data is costing you: 4 ways to overcome common data prep barriers

Enterprises are taking steps to overcome dirty data by establishing data catalogs and glossaries. But

even with these practices, it is likely for some level of dirty data to seep through the cracks of day-to-

day operations. Dirty data commonly happens due to:

Why dirty data happens

Human error

It is the most common cause of dirty data, according to

Experian. Errors can pop up in a variety of ways, from

variability in data entry practices to employees manually

inputting values into spreadsheets. Even a simple spelling error

could pose challenges down the line when someone needs to

analyze the data.

Disparate systems

Organizations often store data in several disparate systems

that have different structures and requirements. When it comes

time to integrate this data, analysts are left with duplicate or

missing fields or inconsistent labels. Data fields or values might

also have the same meaning, but use different names or values

across systems. These issues get even trickier when companies

need to bring in data from external sources like partners or

vendors, where it could be encoded differently or aggregated at

different levels.

Changing data requirements

Businesses evolve and as a result, data administrators and

engineers need to make changes to the data—changing its

granularity, deprecating fields if they’re not being used, or

introducing new fields as needed. Although these changes are

necessary, they are not always widely communicated across a

business, and analysts may not even know about these changes

until they bring the data into a self-service BI or data prep tool.

1.

2.

3.

4Dirty data is costing you: 4 ways to overcome common data prep barriers

Analysts report that the majority of their job is not analysis,

but cleaning and reshaping data. This can occur with an

ETL process, in a self-service data prep tool like Alteryx or

Trifacta, or in spreadsheet tools like Microsoft Excel. Every

time new data is received, analysts need to repeat manual data

preparation tasks to adjust the structure and clean the data

for analysis. This ultimately leads to wasted resources and an

increased risk for human error.

Beyond the frustration of messy data, both analysts and

business users struggle to even access the data they need.

Traditionally, data preparation has lived within IT—and only

certain teams have the ability to bring new data sources into a

centralized data warehouse. Those who don’t have this ability

either conduct their own data prep in programs like Excel or

wait for another team to do it for them. Cathy Bridges, Tableau

Developer at SCAN Health Plan noted that “When we need to

make changes to a data set, it can take weeks at a minimum and

often months.”

Issue 1: Rigid and time-consuming processes don’t keep up with demand

Four common data prep issues (and how to solve them)

01

When we need to make changes to a data set, it can take

weeks at a minimum and often months.

— Cathy Bridges, Tableau Developer, Scan Health Care

5Dirty data is costing you: 4 ways to overcome common data prep barriers

Solution: Develop agile processes with the

right tools to support them

Many organizations are adopting self-service data preparation

solutions for exploration and prototyping. Self-service data

preparation tools put the power in hands of the people who

know data the best—democratizing the data prep process and

reducing the burden on IT. “The added value of a self-service

data prep tool is that everyone can become a master of the

data,” said Venkatesh Shivanna, Senior Data Analytics Manager

and Architect at a popular gaming company. “Analysts can do

the ad-hoc data cleansing tasks themselves instead of waiting

in a queue.”

Every organization has specific needs and there is no ‘one-

size-fits-all’ approach to data preparation, but when selecting

a self-service data preparation tool, organizations should

consider how the tool will evolve processes towards an iterative,

agile approach instead of creating new barriers to entry. Jason

Harmer, consultant in IT process management at Nationwide

Insurance explained how “you can’t really democratize data

without letting people understand the full data prep process.

Visual data prep allows people to see the full end-to-end

process, seeing potential flags earlier on—like misspellings in

the data, extra spaces, or incorrect join clauses. It also increases

confidence in the final analysis.” People will have a greater

desire to prepare and understand their data if they can see how

the impact of their data prep steps.

6Dirty data is costing you: 4 ways to overcome common data prep barriers

Before preparing data, it is crucial to understand its location,

structure, and composition, along with granular details like

field definitions. Some people refer to this process as “data

discovery” and it is a fundamental element of data preparation.

You wouldn’t start a long journey without a basic understanding

of where you’re going, and the same logic applies to data prep.

The emergence of self-service BI and its drag-and-drop

functionality has made data discovery easier for business

users, providing them with a deeper knowledge of the existing

structure and contents of their data sets. But because of

information silos, these users often have less insight into the

entire data landscape of their organization—what data exists,

where it lives, and how it is defined. Confusion around data

definitions, for example, can hinder analysis or worse, lead

to inaccurate analyses across the company. For example, if

someone wants to analyze customer data, they may find that a

marketing team might have a different definition for the term

“customer” than someone in finance.

Solution: Create company standards for

data definitions

Visual, self-service data prep tools allow analysts to dig deeper

into the data to understand its structure and see relationships

between tables. Because they can understand the profile of

their data, analysts can easily spot unexpected values that need

cleaning. Although this technology brings clarity to the data,

people will still need support from others in their company to

understand details like field definitions.

Issue 2: Data preparation requires deep knowledge of organizational data02

7Dirty data is costing you: 4 ways to overcome common data prep barriers

One way to standardize data definitions across a company is

to create a data dictionary. A data dictionary helps analysts

understand how terms are used within each business

application, showing the fields are relevant for analysis

versus the ones that are strictly system-based. Brian Davis,

Project Engineer at an energy company calls data dictionaries

“invaluable.” Brian says, “I regularly combine data from

accounting with data from field technicians. Defining the initial

data along with calculated fields drives more accurate analyses

and reduces the amount of time spent determining which field

or table to use.”

Developing a data dictionary is no small task. Data stewards

and subject matter experts need to commit to ongoing iteration,

checking in as requirements change. If a dictionary is out

of date, it can actually do harm to your organization’s data

strategy. Communication and ownership should be built into the

process from the beginning to determine where the glossary

should live and how often it should be updated and refined.

8Dirty data is costing you: 4 ways to overcome common data prep barriers

Different teams have different requirements and preferences

regarding what makes for “well-structured” data. For example,

database administrators and data engineers prioritize how data

is stored and accessed—and columns may be added that are

strictly for databases to leverage, not humans. When an engineer

builds a data warehouse specifically for analysis, they prioritize

the core business metrics that answer the majority of questions.

If the information that data analysts need isn’t already in the

data set, they may need to adjust aggregations or bring in

outside sources. This can lead to silos or inaccuracies in the data.

Cathy Bridges, Tableau Developer at SCAN Health Plan,

explained how analysts often have to go back and update a data

set that has already been cleaned by another team. “Bringing

in additional columns can be a long and arduous process. For

example, if I need totals versus breakout, I need to duplicate the

data source—and it can be a pain.”

Issue Three: “Clean data” is a matter of perspective03

“A data prep tool should equip the one-off questions from

the analysts and also be repeatable. When I build out the

logic, it’s saved in a file somewhere. And the next time,

I can reopen that same file, repoint at the same data

sources and start from where I left off in that workflow.”

— Gordon Strodel, Information Management And Analytics, Slalom

Solution: Put the power in the hands of the

data experts

Self-service data prep gives analysts the power to polish data

sets in a way that matches their analysis, leading to faster, ad-

9 6

9Dirty data is costing you: 4 ways to overcome common data prep barriers

hoc analyses and allowing them to answer questions as they

appear. It also reduces the burden on IT to restructure the data

whenever an unanticipated question arises. This can also reduce

the amount of duplicated efforts because other analysts can

reuse these models. If the datasets are valuable on a wide scale,

you can combine them into a canonical set in the future.

“A data prep tool should equip the one-off questions from

the analysts and also be repeatable,” says Gordon Strodel,

Information Management and Analytics Consultant at Slalom.

“When I build out the logic, it’s saved in a file somewhere. And

the next time, I can reopen that same file, repoint at the same

data sources and start from where I left off in that workflow.”

10Dirty data is costing you: 4 ways to overcome common data prep barriers

Advanced data preparation tools can be complex, which means

this capability is often restricted to a select number of power

users. But even if analysts and business users don’t have

access to data preparation tools, it doesn’t mean that they

aren’t already performing these tasks in other applications.

Self-service business intelligence tools have opened up data

analysis capabilities to every level of user, but in order to get

insights into their data, these users still need to rely on IT for

well-structured data. Instead of waiting days or months for the

data, users extract data from systems and prepare their data in

spreadsheets. The result is a newly structured data set that serves

a singular purpose and departments often duplicate efforts

without even knowing it. This process leads to an abundance of

data silos, which aren’t efficient, scalable, or governed.

Issue Four: The hidden reality of data prep silos04

“Data dictionaries are invaluable. I regularly combine

data from accounting with data from field technicians.

Defining the initial data along with calculated fields drives

more accurate analyses and reduces the amount of time

spent determining which field or table to use.”

— Jason Harmer

Solution: Create consistency and

collaboration within the data prep process

Combatting silos starts with collaboration. Survey research

from the Business Application Research Center (BARC) showed

that the companies that were most satisfied with their data

prep processes were the ones that “made data preparation a

11Dirty data is costing you: 4 ways to overcome common data prep barriers

shared task between IT and business departments.” Jonathan

Drummey, Consultant at DataBlick and Data Visualization

Specialist at PATH explained that throughout this process,

there should be “people downstream and someone (or multiple

people) upstream. The upstream people are taking feedback

from the downstream people to do cleanup, usually around data

quality issues and availability of supplemental data sets.”

Adopting a self-service data prep across an organization

requires users to learn the ins and outs of the data. Since this

knowledge was historically reserved for IT and data engineering

roles, it is crucial that analysts take time to learn about

nuances within the data, including the granularity and any

transformations that have been done to the data set. Scheduling

regular check-ins or a standardized workflow for questions

allows engineers to share the most up-to-date way to query

and work with valid data, while empowering analysts to prepare

data faster and with greater confidence.

Tableau and Tableau Software are trademarks of Tableau Software, Inc. All other company and product

names may be trademarks of the respective companies with which they are associated.

About Tableau Tableau is the enterprise analytics platform that helps people see and

understand data. Give people access to intuitive visual analytics, interactive

dashboards, and limitless ad-hoc analyses that reveal hidden opportunities

and eureka moments alike. Get the security, governance, and management

you require to confidently integrate Tableau into your business application

and deliver the power of embedded analytics at scale.

Our website has a team of professional writers who can help you write any of your homework. They will write your papers from scratch. We also have a team of editors just to make sure all papers are of HIGH QUALITY & PLAGIARISM FREE. To make an Order you only need to click Ask A Question and we will direct you to our Order Page at WriteDemy. Then fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Fill in all the assignment paper details that are required in the order form with the standard information being the page count, deadline, academic level and type of paper. It is advisable to have this information at hand so that you can quickly fill in the necessary information needed in the form for the essay writer to be immediately assigned to your writing project. Make payment for the custom essay order to enable us to assign a suitable writer to your order. Payments are made through Paypal on a secured billing page. Finally, sit back and relax.

Do you need an answer to this or any other questions?

About Wridemy

We are a professional paper writing website. If you have searched a question and bumped into our website just know you are in the right place to get help in your coursework. We offer HIGH QUALITY & PLAGIARISM FREE Papers.

How It Works

To make an Order you only need to click on “Order Now” and we will direct you to our Order Page. Fill Our Order Form with all your assignment instructions. Select your deadline and pay for your paper. You will get it few hours before your set deadline.

Are there Discounts?

All new clients are eligible for 20% off in their first Order. Our payment method is safe and secure.

Hire a tutor today CLICK HERE to make your first order

Related Tags

Academic APA Writing College Course Discussion Management English Finance General Graduate History Information Justify Literature MLA