Over the last few years the world has changed by leaps and bounds, and it is becoming increasingly difficult to predict what the challenges of the future will be.
At Keepler, we have analysed the market to try to get a glimpse of what challenges lie ahead in the coming year. To do this, we have combined the knowledge of several experts in our field to get this task on track.
Data Warehouse vs Delta Lake vs Lakehouse approach
The amount of data generated every day makes it necessary to look for secure long-term storage solutions and tools that allow the data to be processed and analyzed with the lowest possible latency from the time the data is stored until it is used. That is why the main cloud providers are continuously adapting their storage services so that the gap between storing (Lakehouse) and exploiting the data (DataWareHouse) is becoming smaller and smaller and it is profitable to exploit and analyze the data from the same platform where it is stored (such as Athena on S3 in AWS or Synapse in Azure).
Concepts such as LakeHouse or DataLake seek to offer a cross-company solution that allows the establishment of federated and reusable storage, security and data exploitation policies.
Similarly, technologies such as Delta Lake seek to fill the gap left by public cloud providers in terms of data lifecycle and evolution control over time.
During the last four months of 2021, we have seen exponential growth in the use of the term DataMesh. This concept was first introduced in mid-2019 by Zhamak Dehghani who defined it as, “An alternative sociotechnical approach to managing analytical data”. It is a concept that does not incorporate any new technological component that we did not know or use until now, but aims to evolve the way in which we work with data and that allows scaling and specializing both the tools and the human team to be able to cover the large volume of data that is expected to be generated throughout 2022.
Automation and the effects of LowCode platforms
Intelligent and automated self-services such as LowCode allow enterprises to generate business value more quickly and with less effort. This enables projects to evolve to greater and more complex business scenarios which again introduces the need for traditional development of digital products. LowCode allows us to focus on building secure, resilient, complete and efficient data platforms and to establish with the client the links so that they can go deeper into the consumption of their information and generate better PoCs in less time.
Data Governance and Privacy
We see here a huge gap between customers’ expectations for cloud native data catalog and lineage services and what services like Google Data Catalog or AWS Glue can currently provide. In terms of data governance and privacy we expect hyperscalers to release new services and features in order to compete with popular third party tools like Collibra, similar to how Azure has recently released its overhault data governance service Azure Purview.
Finally we assume more services to include data de-identification features to encode PII (Personally Identifiable Information) and other sensitive information included in data assets. De-identified data can be securely shared and analysed. However, de-identification introduces additional overhead, where data needs to be de-identified in pipelines. We assume more cloud native features to integrate de-identification features natively without the need of moving the data first. Finally, the goal of data de-identification is data democratization with its benefit of accelerated business value creation.
Most big organizations have uploaded terabytes of data to their distributed data lakes. The effort to democratize this data to different types of users with the proper security, scaling operations and without exposing data is going to be huge, so we expect new services, solutions and tools for this issue. Data democratization will be crucial for future business growth, since data is the most valuable resource for modern companies. Enabling the use of this data for the whole organisation will increase business value generation significantly.
To sum up, 2022 gives a promising outlook for data architectures on public clouds. Data governance and privacy will become more and more relevant and build the basis for big data projects. Low code will abstract the development of PoCs, increase value delivery and finally enable new projects with traditional development to emerge. However, it is safe to say that out of all these forecasts data democratization will be the final goal for every data oriented company in 2022. This is a crucial requirement to compete in the market and will directly determine the business value creation for enterprises.
Image: Unsplash | @drmakete