Data Observability Glossary

apm tools

Technology that is used to monitor an organization's application layer for API failures and potential compute issues. Typically, APM tools are not built to monitor data or infrastructure layers nor do they validate the quality of data pipelines.

application performance monitoring

Process of monitoring an organization's application layer within an enterprise infrastructure for potential issues and downtime.

cold data

Data that has not been used recently (or ever) by an organization. Creates internal confusion and leads to higher data storage costs.

data catalog

Centralized inventory of data assets across technologies and environments. Enables users to easily navigate and search for data assets.

data classification

Process of identifying and categorizing similar data assets in order to provide greater context about an organization's data. For example, classifying sensitive data or business data.

data complexity

Problems that arise from an organization's expanding data volumes and related processes, technology, users, and use cases. Failure to mitigate data complexity can lead to inefficient data operations, inflated costs, unreliable data, and system downtime.

data discovery

Process of finding, exploring, and validating data. Data discovery may involve the use of a data catalog, especially for organizations with many data assets and data sources.

data drift

Unexpected content changes that can negatively impact an organization's processes, especially its AI/ML workloads. A consistent increase in missing values within a data set, for example, may indicate data drift.

data efficiency

Utilizing data and data infrastructure in a productive way that aligns with the organization's specific needs. Achieved by minimizing redundant and underutilized data, ineffective data management practices, and resource contention.

data observability platform

Software application that leverages analytics and ML/AI to improve reliability, scalability, and costs across an organization's data, pipelines, and workloads. Provides visibility into the health and performance of all aspects of enterprise data systems.

data outage

Period of downtime during which users and/or downstream applications are unable to access certain data assets. May be caused by a variety of situations, such as resource contention, structural changes, or system health issues.

data pipeline observability

End-to-end visibility into the flow and cost of data across an organization's interconnected systems.

data pipelines

Processes and technology used to ingest data from source systems into an organization's data ecosystem.

data profiling

Process of crawling, analyzing, and summarizing data in a way that helps users understand the organization's data.

data quality

Measurement of a data set's overall health for its intended use. Minimizing or eliminating missing and incorrect data are key aspects of ensuring data quality.

data reconciliation

Process of ensuring that data has arrived as expected during its movement from point A to point B.

data reliability

Ensuring the dependable delivery of quality data in an uninterrupted, on-time schedule. Data reliability is essential for building trust with business users.

data roi

Return on investment realized by leveraging an organization's data. Calculated by subtracting data's estimated return minus its total cost, which includes the sum of costs pertaining to data storage, compute, pipelines, and related systems. The net return is then divided by the total cost, multiplied by 100 and expressed as a percentage.

data success

Effective use of data to achieve an organization's business goals and support its use cases, such as BI reporting, data applications, embedded analytics, AI/ML workloads.

data swamp

Derogatory term used to describe an organization's data when it is siloed or generally disorganized. Guaranteeing data quality and accessibility becomes difficult—if not impossible—when data swamps exist, thereby eroding end user confidence in organizational data and leading to an even murkier data swamp.

data validation

Process of ensuring that data follows and conforms to the schema definition, follows business rules, and is accurate and usable.

data waste

Tangible and intangible costs that can be attributed to a company's inefficient storage and utilization of data.

data-lineage

A historical accounting of data's journey from its original data source to present day usage, including any dependencies and connected assets.

mttr

An abbreviation for "mean time to resolution," a metric that measures a team's responsiveness to resolving issues. (Sometimes referred to as "mean time to recovery.") Calculated by dividing the sum of all time required to resolve issues during a given period by the total number of incidents during the same period. Data teams should strive for low MTTRs.

observability data

Data that helps an organization understand the reliability, scale, and cost of its data, processing, and pipelines. Used to predict, prescribe, prevent, troubleshoot, optimize, and contextualize.

over provisioning

Acquiring or deploying more of a particular resource (storage, compute, etc.) than what is actually necessary to support an organization's current needs. Often occurs as a safeguard to protect against unexpected changes in demand.

schema drift

Structural changes to schemas and tables, such as the addition or deletion of a column, that can break pipelines or impact downstream applications.

service level indicator (sli)

KPIs that measure a service provider's adherence to targets set forth in a company's SLA. Examples of data-related SLIs include data pipeline uptime percentage and average response time.

service level objective (slo)

Specific targets that are defined by an SLA and agreed to by key stakeholders within an organization. Data-related SLOs commonly relate to system availability and service provider responsiveness.

Subscribe to our monthly newsletter

Subscribe to get tips, news, updates, and best practices.