Data Mesh concept is not as new as it seems, it appeared around 2019 by the hands of Zhamak Dehghani, who can be identified as Data Mesh Founder (as she defines herself).
The idea of this concept is to, somehow, eliminate, or at least minimize, the constraints of the monolithic and centralized approaches that have been used in Data Platform Architectures, Data Management and in data teams, namely Data Warehouses and Data Lakes managed by a central team. Data Mesh proposes the adoption of a decentralized model based on a distributed architecture and on the responsibility of the business areas (domains) over their data (decentralization of governance roles). Essentially, it refers to the concept of breaking down Data Lakes and Data Warehouses into smaller, more decentralized portions.
Data Mesh builds on top of four principles:
- The first principle is based on a domain-oriented, decentralised data ownership and architecture, meaning that organisational/business functions need to own their data. Central to this idea is the domain-driven design (DDD).
- The second principle is to think of data as a product and to expose the domain’s products in a form that is usable to others.
- The third principle defines the data infrastructure as a platform that offers different capabilities in a self-service manner.
- The fourth focuses on federated computational governance, balancing the act of having just enough centralised control to ease the work, but keeping the decision making as local as possible.
In a simple vision, with Data Mesh you organise the data like you organise your business and people which is wonderful to accountability.
In a technical way, the Data Mesh is capable of addressing the shortcomings of data warehouses and data lakes by facilitating greater flexibility and autonomy in ownership of data. This translates into greater scope for data experimentation and innovation, since the burden is taken off the hands of a select few experts. The self-serve infrastructure as a platform opens up avenues for a far more universal yet automated approach toward data standardization as well as data collection and sharing. So, potentially you can have more contributions for your data strategy by incentivising a sharing mindset.
Let’s stop here for a moment. Data Mesh seems cool, but not for all
Companies must realize that this is not a recipe that fits in all situations. Data Mesh is not to be applied just because it is a trend. An assessment must be done and business gains must come out from a decision like this.
A Data Mesh strategy could benefit organizations that have a decentralized model with several domains and source complexity. It can help organizations that are highly decentralized as the Data Mesh structure allows different teams to manage their own data and only make quality data available to the rest of the organization as a product.
Here Conway´s law applies. Conway’s law states that: “Organizations, who design systems, are constrained to produce designs which are copies of the communication structures of these organizations.”
In larger organisations, with more complex communication structures, a centralised approach contradicts Conway’s law. Hence, we need a decentralised data management architecture like Data Mesh.
At this point, there are four aspects to analyze in order to consider a Data Mesh approach:
- Organizational stability
- Company cultural readiness
- A solid business case for a Data Mesh approach
- And obviously, budget to invest in the change!
Looking at a perspective that shows that an organization is not ready for the Data Mesh approach, I recommend the reading of an article by Thinh Ha in Medium: “10 reasons why you are not ready to adopt data mesh”. The 10 reasons are:
- You are not operating at a scale where decentralisation makes sense.
- You do not have a strong business-case for how adopting Data Mesh will deliver business value for individual business units.
- You treat Data Mesh as a technical solution with a fixed target rather than an operating model that continuously evolves over time.
- Your organisational culture does not empower bottom-up decision-making.
- You do not have clearly established roles & responsibilities and incentive structure for distributed data teams.
- You do not have a critical mass of data talent.
- Your data teams have low engineering maturity.
- You expect to find off-the-shelf software to help you adopt Data Mesh.
- You do not have buy-in to “shift-left” security, privacy, and compliance.
- You do not consider Data Governance to be a core activity to be prioritised against other activities in every data team’s backlog.
Main challenges of Data Mesh implementation
Let’s assume that we are talking about an organization that fits the characteristics mentioned above for a Data Mesh approach. Here we summarize the main challenges to be addressed:
- A technological fit is one of the major considerations for any organization’s efforts for adopting and implementing a Data Mesh-based strategy for data management. To be able to implement Data Mesh architecture successfully, organizations need to restructure their data platforms, redefine the roles of data domain owners, and overhaul structures to make data product ownership feasible, and transition to develop their analytical data as a product.
- The implementation of a Federated Data Governance Model: Interoperability and standardization of communications, governed globally, is one of the foundational pillars for building distributed systems. A crucial part of moving toward a decentralized data architecture is understanding that federation is about decentralized ownership, which requires well-understood disciplines.
- Domain data cross-functional teams: Domains that provide data as products; need to be augmented with new skill sets: (a) the data product owner and (b) data engineers. We need data skills in all domains in order to build and operate the internal data pipelines of the domains, teams must include data engineers.
- To build a data and self-serve platform design convergence that supports and provides the necessary technology that the domains need to capture, process, store and serve their data products. This platform must hide all the underlying complexity and provide the data infrastructure components in a self-service manner.
- Significant level of change management involved: In order to adapt to Data Mesh decentralized data operations, a significant amount of change effort is needed.
- Cross domain analytics: It is a challenge to guarantee all the rules that will allow cross domain analytics. This shift toward distributed data ownership only works if we apply a wide range of standards to our data products. Without any enterprise standards, distribution and connectivity rules an ecosystem of disorder, disorganization and incompatibility is created and cross domain initiatives are impossible.
Oh wait! I already have a data strategy, does it mean that you need to change it?
We do not think that a change in Data Strategy is necessary (assuming that one exists!). The main goal of being a Data Driven Company is the same. Probably we can rewrite this and use “Data Product Driven Company”.
If Data Mesh is a way to go for a company, then they must rethink the Organizational, Data Management, Data Governance and Data Architecture strategies. And why?
The company must design a road map from being based on a Centralized Data Team, that runs the Data Warehouse and Data Lake (monolithics platforms) and Centralized Data Governance to a new decentralized paradigm.
While the centralized model works for organizations that have a few domains with smaller numbers of diverse consumption cases, it fails for enterprises with rich domains, a large number of sources and a diverse set of business consumers.
For example, I was in charge of a central data team at a telco company (various sources and domains) that did all the ETL for the Data Warehouse and the ingestions for the Data Lake so that the business could then explore the data. We quickly became a bottleneck and “politically” the target to shoot down! Even as the number of data engineers increased, there was always the problem of knowing the source from a business perspective and interpreting the concepts… But the business had no motivation to do so. By the way, why should they have it? From their perspective, the central data team has to deal with this issue!
So, for a Data Mesh all this change and the responsibility of the sources and associated pipelines to feed the domains data products are the domains responsibility. This has an impact on the organization since the domains will need data skilled people (data roles in the domains).
And what happens to the Data Lakes and Data Warehouses that exist today (the architectural impact)? Most likely they become nodes on the mesh and used by a particular domain.
Let’s be clear, this is our vision of the subject
In a Keepler’s view, having a data strategy implies the definition of how to take advantage of the data using the FAIR principles (findability, accessibility, interoperability, and reusability) in order to support Data Products that must be usable, understandable, accessible and interoperable with other Data Products. In the end, Data Products provide value to the business and it is a way of monetizing data.
Taking advantage of the data implies the existence of an ecosystem that allows their exploration and usage: A Data Platform.
We believe that the best approach is to have a Public Cloud Data Platform taking advantage of the variety of services made available by the Cloud Providers. Flexibility, scalability, elasticity, modularity and pay-per-use are all advantages of Cloud Data Platforms that are aligned with the characteristics of a distributed data architecture. In the context of the article, these advantages apply to the platforms in the Domain Nodes and to the Central Self-Serve Data Platform that connect the Domains.