A promising outlook for data architectures on public clouds

2022-promissing-outlook-for-data-acrhitectures-on-public-clouds

Over the last few years the world has changed by leaps and bounds, and it is becoming increasingly difficult to predict what the challenges of the future will be.

At Keepler, we have analysed the market to try to get a glimpse of what challenges lie ahead in the coming year. To do this, we have combined the knowledge of several experts in our field to get this task on track.

Data Warehouse vs Delta Lake vs Lakehouse approach

The amount of data generated every day makes it necessary to look for secure long-term storage solutions and tools that allow the data to be processed and analyzed with the lowest possible latency from the time the data is stored until it is used. That is why the main cloud providers are continuously adapting their storage services so that the gap between storing (Lakehouse) and exploiting the data (DataWareHouse) is becoming smaller and smaller and it is profitable to exploit and analyze the data from the same platform where it is stored (such as Athena on S3 in AWS or Synapse in Azure).

Concepts such as LakeHouse or DataLake seek to offer a cross-company solution that allows the establishment of federated and reusable storage, security and data exploitation policies.

Similarly, technologies such as Delta Lake seek to fill the gap left by public cloud providers in terms of data lifecycle and evolution control over time.

During the last four months of 2021, we have seen exponential growth in the use of the term DataMesh. This concept was first introduced in mid-2019 by Zhamak Dehghani who defined it as, “An alternative sociotechnical approach to managing analytical data”. It is a concept that does not incorporate any new technological component that we did not know or use until now, but aims to evolve the way in which we work with data and that allows scaling and specializing both the tools and the human team to be able to cover the large volume of data that is expected to be generated throughout 2022.

Automation and the effects of LowCode platforms

Intelligent and automated self-services such as LowCode allow enterprises to generate business value more quickly and with less effort. This enables projects to evolve to greater and more complex business scenarios which again introduces the need for traditional development of digital products. LowCode allows us to focus on building secure, resilient, complete and efficient data platforms and to establish with the client the links so that they can go deeper into the consumption of their information and generate better PoCs in less time.

Data Governance and Privacy

We see here a huge gap between customers’ expectations for cloud native data catalog and lineage services and what services like Google Data Catalog or AWS Glue can currently provide. In terms of data governance and privacy we expect hyperscalers to release new services and features in order to compete with popular third party tools like Collibra, similar to how Azure has recently released its overhault data governance service Azure Purview.

Finally we assume more services to include data de-identification features to encode PII (Personally Identifiable Information) and other sensitive information included in data assets. De-identified data can be securely shared and analysed. However, de-identification introduces additional overhead, where data needs to be de-identified in pipelines. We assume more cloud native features to integrate de-identification features natively without the need of moving the data first. Finally, the goal of data de-identification is data democratization with its benefit of accelerated business value creation.

Data Democratization

Most big organizations have uploaded terabytes of data to their distributed data lakes. The effort to democratize this data to different types of users with the proper security, scaling operations and without exposing data is going to be huge, so we expect new services, solutions and tools for this issue. Data democratization will be crucial for future business growth, since data is the most valuable resource for modern companies. Enabling the use of this data for the whole organisation will increase business value generation significantly.

Conclusions

To sum up, 2022 gives a promising outlook for data architectures on public clouds. Data governance and privacy will become more and more relevant and build the basis for big data projects. Low code will abstract the development of PoCs, increase value delivery and finally enable new projects with traditional development to emerge. However, it is safe to say that out of all these forecasts data democratization will be the final goal for every data oriented company in 2022. This is a crucial requirement to compete in the market and will directly determine the business value creation for enterprises.

Image: Unsplash | @drmakete

Alexander Deriglasow

+ posts

Cloud Engineer at Keepler Data Tech: "I am a motivated and ambitious natural science computer scientist and professional cloud architect with an interdisciplinary training background and work experience in Germany and abroad. I am passionate about traveling, especially Japan where I have been living for a while. One of my most favorite experiences in life was to travel by bicycle without any property and money. Just you, the bicycle and the nature. A unique feeling of possibilities and freedom."

Diego Prieto

+ posts

Cloud Architect at Keepler Data Tech: "I am a Software & Cloud Architect who is passionate about new technologies and their applications.
I am not afraid of anything, I simply set myself a new challenge.
To "disconnect" from computers I usually combine it with my hobby of automobilism, which has led me to be currently restoring a classic car."

José Carlos Jiménez

+ posts

Cloud Engineer at Keepler: “As a technology lover, I have an innate desire to always be learning, improving, researching... and, of course, teaching. Currently, I am immersed in the world of software development oriented to Big Data technologies.”

Pablo Valiente

+ posts

Principal Architect at Keepler. "I am passionate about the world of software development and architecture, which drives me to be in continuous search of new technologies to learn and research. My professional work is developed on Cloud Computing with Amazon Web Services technology and non-relational databases (NoSQL) such as DynamoDB or MongoDB."

0 Comments

The AWS European Sovereign Cloud: A New Horizon for Digital Sovereignty

Jan 19, 2026

These days, if you take a look at LinkedIn or specialized technology forums, you will notice a particular buzz. And for good reason: the announcement of the launch of the AWS European Sovereign Cloud has shaken up the playing field. For years, the debate around...

Translating code to make scripts more efficient in a production environment

Jan 26, 2024

In this article I am going to explain the scenario I have been working on the past year translating code from Pandas to PySpark to improve performance times and make scripts more efficient in a production environment. Let’s start understanding these technologies....

5+1 keys for public cloud providers

Jan 10, 2024

GenAI Cloud Focused Artificial intelligence has been a central topic among various public cloud providers in recent years. In 2024, this trend, particularly Generative AI, is further encouraged, reflecting companies' commitment to this technology. Different public...