This blog is adapted from a session at the recent Enterprise Data Summit, Addressing Productivity Black Holes for Data Analytics and AI Teams, presented by Sandeep Uttamchandani, Ph.D., VP, Analytics, AI & Data @ Intuit. We invite you to watch the presentation as well.
Over the past few years, the market has seen remarkable growth in the number of tools and technical solutions available to data teams. However, it's important to recognize that the landscape is constantly evolving, and keeping up with the rapid pace of innovation across the board can be challenging. Despite the availability of tools, technical solutions, and innovations, there are still operational obstacles and gaps in data coverage that prevent even the most sophisticated organizations from realizing the full value of their enterprise data.
The issue is not simply that there’s too much data and we can’t sort through it all. Volume of data, and the corresponding effort to make sense of it all, is certainly an issue. But if you dig deeper, you discover that we actually know what we want to know - we just don’t totally understand how to get it. Multiple studies highlight that data teams are not getting insights at the desired variety, they’re not achieving widespread adoption of models, or fully democratizing the value of data through analytics and AI. So, what impedes progress?
While technical solutions and tools are essential, we need to look at the blind spots that hinder data and AI teams throughout the data journey. These blind spots are issues that slow down progress in various stages, such as data production, problem definition, preparation, development, operationalization, and the final mile. These patterns of problems present challenges for teams working with data and AI. They prevent full transparency of data and reduce the efficacy of the data that gets used (when it’s incomplete).
While the market often emphasizes technical solutions or tools as the means to address these slowdowns, it's important to recognize that it requires a holistic approach. Overcoming the blind spots that impede the pace of insight development involves a combination of processes, mindsets, team design, and data maturity. Merely possessing the best or most efficient tools will not suffice if a holistic perspective is not adopted. The impact can be measured through metrics such as time to insights, quality of insights, data reliability, and scalability.
In conversations with multiple data teams, we’ve discovered some interesting issues and concerns that either are well-established blind spots, or are issues that lead to blind spots. These are not all necessarily technical issues, but they all hinder productivity for technical teams. We look more closely at these below:
The data analytics and AI teams often face challenges related to accessing the right data sets, defining key performance indicators (KPIs), conducting experiments, creating common dashboards, deploying models, and establishing analysis patterns. These are examples of process blind spots where essential knowledge exists within the team but is not properly documented. As a result, team members are required to gather this information from others, leading to inefficiencies and difficulties.
We know all too well that as teams evolve and attrition occurs, critical knowledge can be lost. There are instances where certain models or metrics become orphans, lacking information about the rationale behind specific data sets, definitions, or logic. This absence of documentation creates significant bottlenecks that impede the progress of teams.
The important takeaway from this scenario is that while building deliverables in the AI and analytics space, it is equally vital to prioritize the development of knowledge, documentation, and sharing practices within the team. This ensures that valuable insights and expertise are not solely reliant on individual team members, but are collectively captured, documented, and accessible to all.
Imagine trying to build out a model where one of the metrics or input parameters of the model is not matching with actual source data. It’s probably, and unfortunately, a scenario that you’re likely very familiar with. If so, you’ve probably checked the transformation logic, checked for various attributes of data quality, and started to question whether there is something else going on that could be causing these issues to happen.
When considering the process of transforming source data and making it available for AI models and analytics, the creation of data pipelines becomes essential. These pipelines are responsible for converting the data into the required format, performing necessary transformations, and integrating it with other data sources to enable downstream insights. However, this process has become increasingly complex.
The term "jungle of pipelines" describes the situation where numerous transformations are built just to ensure that the data becomes consumable. When issues arise, teams often find themselves spending a significant amount of time debugging and understanding what went wrong.
To overcome this blind spots, it is important to leverage tools and technologies that simplify data transformations and facilitate unit testing and quality checks. However, an even more impactful approach is to collaborate closely with the source teams to design data in a way that simplifies or eliminates the need for extensive transformations. The ultimate goal should be to generate and use data in the appropriate format from the beginning, reducing the reliance on complex pipelines and enhancing overall productivity.
Let's consider a scenario where improper data understanding becomes an issue. In this example, think about a data scientist who is engaging with a team member who is responsible for producing one of the data sets consumed by our data scientist. Like a lot of collaborations, this one operates off of an incorrect assumption of certain definitions related to a specific attribute.
This common situation highlights the importance of understanding the precise meaning of attributes when data is consumed downstream. Questions arise about the composition of the data set, potential biases within it, and whether outliers were excluded based on the producer's discretion. These assumptions and details must be communicated to ensure the development of accurate insights downstream. Data maturity plays a crucial role here, with one key aspect being the effectiveness of data documentation in promoting understanding.
The key takeaway is that data documentation is exceptionally critical. Although it may not yield immediate returns on investment, teams should prioritize documenting data and ensuring it remains up to date. Utilizing appropriate automation and methodologies is essential to address this pattern effectively. Improper understanding of data can lead to teams spending an extensive amount of time generating insights that are ultimately incorrect due to a lack of understanding regarding attribute meanings. Frequently, teams rely solely on attribute names to infer their significance.
To mitigate this issue, emphasis should be placed on comprehensive and accurate documentation, enabling teams to have a clear understanding of data attributes and their meanings.
Related to data maturity is the topic of data source changes that don’t get communicated or don’t operate according to effective collaboration. This blind spot manifests when incorrect values are reflected in metrics and dashboards. The root of the issue is usually that changes have been made to the source data without any prior notice, leading to discrepancies.
This scenario highlights a common challenge where teams face issues such as escalations and information mismatches. They are compelled to delve into extensive digging and debugging to identify the root cause. Often, it becomes apparent that changes made to the source data are causing the impact.
To address this challenge, it is crucial to treat data as code. In the realm of code, APIs have defined structures and undergo unit testing and verification. Similarly, there is a growing concept of data contracts that emphasizes the importance of treating data with similar rigor. This approach ensures data hygiene and promotes the handling of data changes in a structured and controlled manner.
Treating data as code is vital to prevent ad hoc or unplanned work that analysts and AI teams encounter when rectifying changes that occurred at the source. These unexpected tasks can be counterproductive as they divert valuable time and resources from other essential activities.
Adopting data-as-code principles and implementing data contracts helps streamline the handling of source changes, reduces inconsistencies, and enhances overall productivity for analysts and AI teams.
This blind spot may be one of the more truly human scenarios, as it deals with team design.
Different roles bring different perspectives to the various tasks in front of a data team. This variety has huge advantages - different viewpoints and experiences help to inform a more inclusive and usable data environment. But quite often, assumptions are often made about the capabilities and responsibilities of people in different roles. For example, it’s often thought that a data analyst possesses comprehensive end-to-end domain knowledge, including transformation logic, and that the data engineer becomes well-acquainted with this logic. While that may typically be the case, it isn't always the case. And what’s more, there’s a lot of nuance to what the people in each role know and don’t know.
There are a variety of established titles in the data industry, but these titles do not necessarily imply the same responsibilities or scope of work that they always have. When recruiting and building data teams, individuals may have different assumptions based on their past experiences regarding the meaning of these titles and the expectations associated with them. Therefore, it becomes crucial to document roles and responsibilities explicitly. It’s unwise to presume that a role automatically implies certain tasks or responsibilities.
Every team is unique, with varying levels of data and tool maturity, which directly influences the specific requirements of individual roles. Clarity and a shared understanding of roles and responsibilities within the team are essential to prevent incorrect assumptions and avoid dropped balls. Having upfront and transparent discussions about roles and responsibilities can save significant time and prevent misunderstandings.
In every data team, there are a variety of people making decisions about technology and business, with the repercussions of those decisions being far-reaching. But without the benefit of clear understanding of what the data provides, team members might make decisions based solely on their skill set and neglect to account for how those decisions impact other parts of the organization.
Data teams require clear and accurate interpretation of insights. Whether in the analytics world, collaborating with business teams, or in the AI domain where features are built for products or customers at large, it is crucial to ensure that the meaning behind data insights is crystal clear. As we strive for self-service and democratization of data, literacy becomes paramount. Data literacy provides guardrails to ensure that insights are correctly understood and applied in a manner that aligns with the business's needs.
When it comes to literacy, we can’t assume a single, universally understood definition. Metrics can be notorious for having multiple definitions, and different business stakeholders may interpret the same information differently, leading to varying conclusions. From a productivity perspective, this scenario represents the last mile, the final step in the process. If we fail to accurately convey insights in this last mile, it can raise doubts about the impact and value of the entire data exercise.
To ensure the effectiveness and value of insights, it is crucial to prioritize clear communication, provide comprehensive explanations, and establish a shared understanding of metrics and their interpretations. Bridging the gap in this last mile is essential for driving meaningful and impactful decision-making within the business.
Productivity blind spots can hinder progress, drain resources, and lead to suboptimal outcomes for data and AI teams. These blind spots arise from various challenges, such as improper data understanding, complex data pipelines, uncorrected source changes, assumptions about roles and responsibilities, and misinterpretation of insights. To address these blind spots and enhance productivity, data teams have to think beyond just goals. They must foster a culture of comprehensive data documentation, treating data as code, establishing clear roles and responsibilities, promoting literacy and shared understanding, and prioritizing effective team design. By tackling these productivity blind spots head-on, data analytics and AI teams can unlock their true potential, maximize efficiency, and drive impactful outcomes that propel businesses forward in the data-driven era.
Watch the original session, Addressing Productivity Black Holes for Data Analytics and AI Teams: