Data Products vs Data as a Project

keepler-data-product-versus-data-project

The growth in the number of data-driven projects is having a great impact on companies in different industrial and technological sectors. These projects are being driven to improve process automation, optimize resources, and obtain valuable information to improve decision making.

The technical and business teams dedicated to the development of these solutions are focused on the exploitation of the information available in their “Data Lakes”, developing analytical tools such as the construction of dashboards to reflect relevant business KPIs, the ingestion and transformation of large amounts of data or the implementation of ML models that allow inference in certain use cases.

All this technological transformation is having its challenges in relation to the rapid development and scalability of these data projects where some difficulties can be appreciated such as the bottlenecks produced by centralized Data and ML teams, in collaboration with other functional teams more oriented to the data domain or the consumers of this information themselves.

When starting new projects, the Data team usually performs a feasibility study with the exploration of the new datasets provided and tries to determine the target business metrics to be covered. It is in these moments where valuable time is invested in understanding the data, in its transformation and in capturing the needs, which sometimes implies that the projects are delayed in this initial phase and it is necessary to make an extraordinary effort for the acquisition of the business knowledge associated with these data.

Faced with this situation, in which those responsible for the data domain do not have to take into account the use that can be made of the data, the need arises to change data projects as we traditionally see them with the development of Data Products.

A Data Product must be implemented, developed and maintained by a team responsible for a data domain. It therefore belongs to exactly one domain.

It can be defined as an available dataset, or a dashboard reflecting different KPIs or an ML model accessible from other data domains through an interface or API. It must not only provide the data but also the necessary information for its understanding (structure, metadata, interfaces to consume it, maintenance or life cycle).

The objective of a Data Product is to be a reusable asset defined to provide reliable data for a specific purpose aligned with business needs.

Zhamag Dehgani in his book “Data Mesh: Delivering Data-Driven Value at Scale” indicates the main characteristics that define a Data Product, which can be summarized as follows:

For a Data Product to be useful it requires at least the following qualities:

Designed to be upgradable: they must be versionable or extensible, adding new functionalities in the future.
Designed to scale: given the increasing growth rate of available data, the number of data sources in a domain, or the diversity of users.
Designed to provide value: focused on simply providing the highest possible quality and reliable data to consumers in an understandable way.

To better understand this concept, let’s look at some examples.

Is Gmail a Data Product? The truth is that it is not, since its primary purpose is to enable asynchronous written communication between users, but the determination of an email as spam is based on the application of natural language processing techniques.

Another example can be Instagram, which also cannot be considered as a Data Product, however it is composed of them such as notifications, search or browse option.

Finally, is Google Analytics a Data Product? Yes, it is a product whose purpose is to provide information about user behavior on websites.

In the same way, Google’s search engine or Netflix’s recommender system are highly scalable data products.

The development of new data products is not trivial for a company that is currently involved in the implementation of traditional Data Projects, as it requires a transformation in the operational strategy that allows the development of an environment in which templates and data pipelines are standardized that can accelerate the launch of new products.

It is also necessary to have teams that acquire ownership of the different data domains in which these products will be developed.

There are many aspects that must be taken into account when defining new data products, such as defining metadata, establishing the necessary requirements that the new data to be incorporated into the domain must have, determining the different ways in which the data will be accessible, establishing data profiling, versioning and the life cycle of the data, or establishing the level of granularity at which the applications, domains or components will be separated, among others.

Keepler has based its offering on the development of a full-stack analytics service based on public cloud infrastructure capabilities, applying best practices in data engineering, cloud, data governance, data science and data visualization. This approach, together with an Agile methodological approach, allows for an efficient identification, definition, development and deployment of new data products for its customers.

Our Data Products proposal involves the creation or evolution of Data Lakes focused on extracting value from the descriptive analysis of information.

Additionally, incorporating AI / ML capabilities that allow more sophisticated analysis and the generation of new relevant information to improve decision making and reduce uncertainty.

Javier Pacheco

+ posts

Data Scientist in Keepler Data Tech: "Live full, die empty" defines my state. This becomes my lifestyle taking me out of my comfort zone and driving my voracious learning attitude about different aspects of Data Science. I love learning by teaching and am always open to new challenges that push me further my comprehension."

0 Comments

Let’s Talk About Palantir: Is It the Accelerator Your Data & AI Projects Need?

Oct 21, 2025

Palantir is on everyone’s lips. Lately, many clients have been approaching us to ask about it. The platform—originally known for its work in intelligence and defense—has become a critical transformation tool in the commercial sector. At Keepler, we understand why our...

Why Your Data Needs to Be “AI-Ready”

Feb 26, 2025

Artificial intelligence is rapidly transforming industries, promising unprecedented efficiency, innovation, and growth. However, the true potential of AI remains untapped for many organizations. The key to unlocking this potential lies in a fundamental principle: AI...

The Democratization of Data: Transforming Decision-Making in the Digital Age

Oct 17, 2024

When we speak of "data democratization," we refer to making data accessible to all employees within an organization, regardless of their hierarchical level or role. Traditionally, data access was limited to certain departments, such as data analysis, creating...

Data Products vs Data as a Project

Javier Pacheco

0 Comments

Leave a ReplyCancel reply

Javier Pacheco

January 18, 2023

Data

Categories

Archive

You May Also Like

Let’s Talk About Palantir: Is It the Accelerator Your Data & AI Projects Need?

Why Your Data Needs to Be “AI-Ready”

The Democratization of Data: Transforming Decision-Making in the Digital Age

Data Products vs Data as a Project

Javier Pacheco

0 Comments

Leave a ReplyCancel reply

Javier Pacheco

January 18, 2023

Data

Categories

Archive

You May Also Like

Let’s Talk About Palantir: Is It the Accelerator Your Data & AI Projects Need?

Why Your Data Needs to Be “AI-Ready”

The Democratization of Data: Transforming Decision-Making in the Digital Age

Discover more from Keepler | The AI Enabler Partner