According to data published by IDC, 64.2 Zetabytes of data were created only in 2020, a figure very close to the total installed storage capacity of 6.8 Zetabytes. The explosion of data generation has a lot to do with the Covid-19 drive, which boosted consumption for remote work and education, increased digital leisure and social network consumption. 

However, of all this constantly growing volume of data, only 2% will have been saved and stored by 2021, the vast majority of which will either be lost or form part of what is known as “ephemeral data”, data that is only associated with the moment of consumption, requiring only temporary storage, or that is updated or overwritten with new data. 

In this context, what is clear is that companies should prepare their systems to capture more information. Data is and will be a key business value for organizations, allowing them to enter what is known as the virtuous circle of data: data capture generates valuable information, which, when analyzed and exploited, generates business insights to improve decision making and generate more sales, which will return to deliver more data that will generate new insights. 

More data, new challenges

With such a volume of information, many organizations face challenges related to the management, organization and orderly consumption of data, even more so in cloud and hybrid technology environments. 

As the organization scales, new needs and initiatives appear that, in many cases, involve the construction of data repositories scattered in different areas, with different technologies and with different consumption modes.

On many occasions, this situation presents problems derived from the maintenance of these information repositories that become isolated silos. The generation of silos in the organization leads to problems such as the lack of knowledge or inaccessibility of the information by the different business units, which work without visibility of what other areas do and without taking advantage of the value they generate; the replication of work in different areas and departments, repeating tasks over and over again when more efficient synergies could be generated; the inability to implement use case initiatives that consume data, due to the lack of knowledge of the existence of such data; the lack of trust and reliability in existing data, caused by isolated management itself and without common and shared criteria, which leads to their non-use or to spending excessive time validating them; and finally, and perhaps most importantly, the lack of a figure to represent and ensure the quality and consistency of data throughout the organization. 

But in such a scenario, all is not lost. There are mechanisms that help to face these challenges and that involve the discovery of data and the definition of levers of change within the organization. 

Collaboration is indispensable, through working closely with and among the business representatives of each initiative and data generating area within the organization; the classification of the organization’s data domains, complemented by the definition of subdomains and datasets that belong to each of them; the identification of the technical domain of each of these datasets; the definition of a working framework in terms of classification, cataloging and data quality control; and, all this, supported by a data governance tool that helps with the implementation of the management and governance model. 

The implementation of this type of mechanism is neither direct nor immediate; it involves change management within the business and IT teams, which requires training and time. However, once the framework has been defined, there are certain tasks that can be carried out in a distributed manner in the different departments and work teams and that greatly help data management to be successful. 

  • Distribute the task of initial data discovery among the data generating teams. 
  • Establish a data management model (Data Owner, Data Steward) where each area is responsible for its data domains.
  • Divide the tasks of cataloging technical and business data. 
  • Clearly define the data access mechanisms and who is responsible for them.

To manage and monitor these distributed tasks, technological tools and solutions are necessary. For example, some organizations tend to support their data management and governance in complete, ready-to-use and licensed suites such as Collibra or Informatica. There are also cloud providers’ own solutions and open source versions that allow customization of processes and functionalities to the specific needs of each organization, such as Apache Atlas or Datahub.

This type of platforms or tools offer a series of common functionalities: catalog and organization of data and data domains, definition of roles and data administrators in their different flavors, implementation of glossaries of business terms, data lineage and consumption information, data exploitation functionalities, usage and consumption reporting… 

In short, effective and efficient data management requires several essential steps: a phase of discovery of the organization’s data, a phase of definition and consolidation of a framework in terms of data quality, ownership and organization, and a phase of landing and technological implementation that will help us in the start-up and implementation of this framework.

Organizations that are able to make this transition to a more distributed, accessible and effective data management will be able to more easily and successfully implement a data culture within the organization.

Author

  • Cloud Architect en Keepler. "Lifelong learner and interested in cloud computing and public cloud technologies. Engineer with extensive experience in backend development and skills in machine learning techniques. Passionate about learning and solving real world problems. I enjoy collaborative teamwork, sharing knowledge and creating amazing products."