The industrial and energy sectors are in full transformation driven by Industry 4.0 where the use and extraction of value from data has special relevance. 

Cepsa is a global energy company addressing and spearheading this digital transformation in all its sectors and businesses to become a data-driven company. It relies on advanced analytics to make decisions and gain agility in its working practices, generating added value in all activities. Cepsa integrates all stages of the value chain of hydrocarbons, as well as manufacturing products from raw materials of plant origin and maintains a presence in the renewable energy sector.

With more than 85 years’ experience and a team of around 10,000 professionals, Cepsa boasts technical excellence and adaptive capacity and has a presence in the five continents through its Exploration and Production, Refining, Chemicals, Marketing and Distribution, Natural Gas, Electricity and Trading business areas. 

In this industrial ecosystem, current control systems and history logs have shown important limitations in terms of integrating and analysing information together with data extraneous to its own plants. Moreover, these systems have some closed licensing models that penalise clients when seeking to integrate external information, as in the case of laboratory data, weather information or concerning costs and prices.

Cepsa, AWS and Keepler

Working with Keepler as solution integrators in the Amazon Web Services (AWS) cloud, Cepsa has built a Data Lake in the cloud which centralises all information from hundreds of thousands of sensors installed in its manufacturing plants, integrating additional sources which enrich information and allow data to be exploited via advanced analytics processes, visualization and Business Intelligence tools.

Further facilitated by IoT protocol standardisation, this allows current control platforms to be used, adding functions of unlimited history logs, cost reduction, multivendor capacity and with ample resources for integrating and undertaking sophisticated analysis of external data.

Data Lake Manufacturing

The Data Lake constructed by Cepsa is able to input, process and make an average of 2,000 signals per second of a first phase available to platform users, with the speed of a Near-Real Time model, as well as storing information from several years of historical data with a petabyte level growth projection. 

The solution is based entirely on the use of managed services, resulting in a serverless deployment that is easy to maintain, robust, secure and scalable.  The main AWS services used are: 

  • AWS IoT as central MQTT messaging service broker.
  • AWS Greengrass for the integration of on-premise sensors via MQTT and OPC-UA.
  • Amazon Kinesis for data processing in Near Real Time.
  • Amazon S3 as principle storage repository.
  • Amazon Athena to consult the Data Lake through SQL.
  • AWS Lambda and AWS Fargate for the execution of application logic.
  • AWS Glue as ETL tool and Data Catalogue.
  • AWS ElasticSearch as indexed data repository for time series.
  • Amazon DynamoDB for storing metadata.
  • AWS Database Migration Service for migration and replication of on-premise databases.
Low Cost Innovation

One of the main benefits of the public AWS cloud is the advantages of the pay per use model, which has allowed Cepsa to make use of an innovative and technological solution without the need for large initial investment and at a very low experimentation cost. By using a solution deployed entirely via managed services, the operations cost has also been very low. In addition, the raw cost of information storage is low compared to traditional systems that can exceed thousands to millions of euros. 

Long Term Scalability

The solution pieces auto scale horizontally, so that the addition of more sensors will not lead to a platform bottleneck in the future.  Furthermore, this is an open system that permits the addition of any data exploitation tool deployable on AWS, thus facilitating implementation of future cloud provider innovations. The storage system in S3 enables Cepsa to store values issued by all sensors, current and future, without having to apply interpolation mechanisms and value approximation.

High Security

The system uses AWS services like S3 and DynamoDB that provide high availability and error tolerance.  Using the IAM role system, the solution blends with Cepsa’s corporate identity manager, guaranteeing that information access is protected and controlled. Likewise, data remains encoded, benefitting from S3 encryption capabilities. All these security levels provide a highly secure environment in all services. 

Image: Cepsa. Montse Zamorano.

See further details of this use case at Cepsa | Big Data and IoT use case.


  • Keepler

    Software company specialized in the design, construction and operation of digital data products based on cloud computing platforms.