Today, the term “open data” appears increasingly, but what does this actually refer to? According to the Open Knowledge Foundation, the answer is clear and simple: “data that can be freely used, shared and reused by anyone, anywhere and for any purpose“.

These open data repositories come from different sources, but were originally promoted by academic institutions whose main objective was to enhance the field of research with free access to open data. Later, government administrations joined in and have been the main generators of this type of data storage so far, with participation ranging from small town councils to large administrations such as the European Union or the US government.

Here are some links to various platforms with open data:

This boom in open data has influenced large companies such as Google to promote use and dissemination of Open Data by creating special search tools, such as Dataset Search.

Various questions about open data are answered below, providing an overview of what these data sets can offer us:


The topics covered by these types of data are closely related to the publishing entities – mostly public bodies – and tend to focus on the following thematic areas:

  • Environment.
  • Public sector.
  • Economy.
  • Demographics.
  • Public services.
  • Transport.

However, outside these fields there are many other areas that may be searched at different levels of detail according to our needs. Likewise, taking into account the exponential growth registered for open data in the last five years (in the case of Spain, growth has increased by nearly 1000%), this will allow us to access any type of data in the future.


Data format is classified into the following three types:

  • Structured formats (eg CSV, XLS).
  • Semi-structured formats (eg HTML, JSON).
  • Unstructured formats (eg PDF).

There are also the semi-structured formats of the KMZ or KML, which allow us to display data in geographic applications such as Google Maps, which give added value to our data analysis.


The first thing to consider when integrating this type of data is whether it is pre-treated or, as it is mostly raw data, if it will need to be pre-treated. Accordingly, the quality of data must always be taken into account. For further information, we encourage you to read previous entries such as Data quality: a back-end approach or Data quality: a front-end approach in which this is discussed in more detail.

In addition to the above, for correct use of this data, the following must also be taken into account:

  • Publishing entity.
  • Documentation presented by the publishing entity.
  • Frequency of data, if it is applicable for our use.

Open data can be used in two ways in our projects:

  • Complementing existing information, which will give us a more global vision of our data and will add value.
  • Forming the basis of our projects, for example, those related to Smart Cities, shown in the following link Smart Cities Open Data projects.
#OpenData open world of possibilities to complement or initiate our data projects #DataScience Click To Tweet

In one way or another, talking about Open Data implies an open world of possibilities to complement our projects or start new ones, since it provides society and business environments with a unique opportunity to freely access large amounts of data, with the consequent value that comes from it. This in turn promotes a collaborative environment in society, which is why you are encouraged to use it, as well as to publish the results of your analysis.

Image: Unsplash | Franki Chamaki


  • Roberto Corral

    Data Analyst en Keepler. "Mathematician with a Master in Big Data and DataScience. I like to be able to generate solutions to support business areas and user training. I develop with Business Intelligence tools and I am certified in Microstrategy. I have knowledge and experience in DB, cloud environments and ETL tools. I adapt to work in teams in any environment.".