Data Quality & Reliability

Building Great Data Products Starts with Data Quality

December 13, 2022
10 Min Read

Almost every article about data begins with a statement of incontrovertible fact - that there’s a lot of it. Numbers in the petabytes and zettabytes get thrown around to demonstrate that data is growing at a multiplicative rate, but what these stories often neglect is the most important aspect of data: what can be done with it. For enterprises that have invested in creating data environments, the most valuable use of that data is when it’s used to create data products that can be deployed to develop competitive advantages at a massive scale.

We don’t want to discount the “data is everywhere” trope. Every enterprise on the planet – irrespective of industry, size, or stage of maturity – relies on data for everything from internal processes to critical business functions that handle global logistics, go-to-market activities, and product delivery. As an example of the importance of data volume and strategy, consider that a company’s social media accounts provide all types of data points, and a good analyst can take a month’s worth of retweet information and help you improve your audience targeting by 80%. Data insights into product usage can not only improve your product development plans, they often tell you that you have a product/market mismatch that can be rectified. A lot of data, applied intelligently, can deliver great value to your organization. But we’re talking about using the ability to get that data, and the willingness to build a marketable product out of it that can deliver to you incredible economic gains. 

Now, we aren't just talking about taking a data-driven approach to building new types of products. Of course, it’s certainly important to have good data analysis so you can make better decisions. But if you’re sitting on data that is unique to your organization, your customers, and your industry, you are able to actually use that data to inform new digital products that probably have never been considered previously. Zillow took publicly available data about residential housing sales to provide home buyers and homeowners with insights that were previously only available to licensed realtors. PhonePe, a Walmart subsidiary that operates a payment app in India, is using data to create a massive financial payments ecosystem that processes more than $1 trillion in transactional value. 

In many instances, data is giving you better efficiency, and that’s great. But what differentiates enterprises in the data economy is the ability to identify and capture unique data that fuels your product strategies so you can actually package and deliver solutions – solutions borne from that data – that no one else in the market can. Enterprises that can do that are essentially operating with an anuity derived from their data investment. Even Warren Buffett, who claims to not understand technology, would salivate at the prospect of an enterprise that could exploit that type of economic dynamic.

Creating data products demands that you are operating with accurate, high-quality data that you can rely on, and which gives you the confidence to make business-critical decisions. Organizations that take care to do this are the ones who quickly become adept at identifying when data can give them an advantage, and then can rapidly productize that data. Every enterprise has the basic ingredients at its disposal, but building great data products requires the continuous application of insights and processes, and a mindset that is relentless about its data, especially its quality, accuracy, and overall health. Here are the most important requirements for creating innovative, marketable new data products that can help your organization derive new financial value from your existing data investments.

What does your data tell you?

The first step in this process is to identify the “fit.” In other words, it’s how you align your data (based on the sources you have, the pipelines that feed your systems, and the interaction among those sources) with an identifiable need in your market. This requires collaboration among data practitioners and data business leaders to explore the intersection of data investments with opportunities. This collaboration should question whether available data delivers insights that are not available anywhere else in the market, and explore how to rapidly build it back into your existing products or create new ones. 

Once you know what you have access to and how it can be used, you and your data team can begin building a product requirements roadmap, just as you would for any product. The difference now, however, is that you are in total control of every aspect of the product because you own the data.

Be obsessive about having accurate, reliable data

At the risk of putting too fine a point on it, there’s no better way to explain this than with a quote from a recent First Round Review article: “If you’re not thinking about how to keep your data clean from the very beginning, you’re f***ed. I guarantee it.”

Insightful commentary, indeed, and there’s a great deal of truth in it, even if you currently are not sitting on reliably clean data. As sophisticated as your data stack might become, you may not always have the most accurate, clean data from the get-go, and that’s actually okay. It’s a great goal, and yes, some data teams are able to build their stacks from the ground up. Yet, irrespective of when data reliability and accuracy become a focus in your efforts, enterprises building great data products are typically doing so on the backs of data observability, which provides the continuous ability to ensure reliability, data pipeline efficiency, elimination of data blind spots, and spend/performance management capabilities.

We all know about the negative consequences of “garbage in, garbage out,” and when building data products, this should be considered a guiding principle. Your data is valuable, but only if the people using your data products can trust it.

Do you know what users want to know?

More is not always better. You may be able to build a data product that delivers massive amounts of information, but it may be that your users only want specific information. If you overwhelm them with too much data, they won’t know where to start, and in an economy that relies on short attention spans, you should know your users well enough to get the data they need in an elegant, easy-to-digest fashion.

Consider PubMatic, an Acceldata customer and a digital advertising leader. Driving their comprehensive ad platform is an engine that processes almost six petabytes of data every single day. Within all of that data is an overwhelming amount of information that could undoubtedly be put to use by advertisers across all their different digital channels. But the goal for PubMatic is to get their customers to execute, and to execute with the most up-to-date, reliable information. Armed with that information, their customers can make better, faster decisions that lead to more clicks. However, the genius is not just in the data. It’s in the fact that PubMatic shows restraint in what they surface to their customers so they are not overwhelmed and paralyzed. Instead, they are presented with actionable dashboards that enable them to act.

Iterate fast, continuously, AND accurately

The beauty of a data product is that every new piece of data can be applied to improve the product. Each new insight sheds new, important perspectives that give enterprises a continuously improving product, and that increasingly make their offering unique (and valuable). With DevOps and similar agile development and delivery methods, enterprises can move new, improved products without disrupting customers.

But there’s a dark side to this type of continuum. If you can’t trust the quality of your data, then you could likely be giving your customers an inaccurate, useless product. And if you operate with the continuous delivery model, there’s a good chance that you’re adding more and more bad data into a product without customers realizing it – until it’s too late and they’ve made poor decisions all because of your product.  

This is why data teams have to ensure the reliability of their data pipelines, and make quality intrinsically baked into their data operations. Data observability provides them with the ability to get comprehensive visibility across the entirety of their data stack so they know when anomalous behavior, schema drift, processing performance, or any of a number of issues are occurring. They can be remediated immediately before bad data leads to bad outcomes.

Effectively building a great data product requires the collaborative, innovative work of smart data teams, but it also demands the operational foundation that only data observability can provide. With good, reliable data, the potential to create new economic channels for data already in your enterprise is unlimited. But data quality is not just the starting point, but the overarching driving factor from which all data products must adhere to. 

Learn more about how data observability can help your enterprise build great data products.

Photo by Dan Asaki on Unsplash

Similar posts

With over 2,400 apps available in the Slack App Directory.

Ready to start your
data observability journey?