SUCCESS CASE #BigData #IoT
Data Lake of Manufacturing in the Cloud

Cepsa is a consolidated multinational company in the sector, with years of experience and a team of more than 11,000 professionals across the five continents in which it operates, integrating all phases of the energy value chain.
Cepsa aspires to become a benchmark for sustainable mobility, biofuels and green hydrogen in Spain and Portugal, as well as a key company in the Energy Transition, putting customers at the center of its activity and helping them in their efforts towards decarbonization.
This change goes hand in hand with a new industrial revolution called Industry 4.0 where the use of data has a particular relevance.
Current systems of control and events historification have proved to be significantly limited when enabling to integrate and analyse information together with data external to their own plants. Besides, those systems have closed licencing models that penalizes the client when integrating external information, such as Lab Data, weather information, costs and prices information…
IoT standardization protocols enable to use current platforms of plants control but adding limitless historification functionality, cheap and with large capacities to integrate external data and to perform sophisticated analysis on them.
With this solution Cepsa is seeking to build a Data Lake in the cloud that centralizes the information coming from hundreds of thousands of sensors installed in their manufacture plants, that integrates additional sources to enrich this information and that enables to exploit the data using advanced analytics processes, visualization and Business Intelligence tools.
Data Lake is capable of intaking, processing and making available to platform users an average of two thousand signals per second rapidly in a Near-Real Time model, as well as persisting the information in a historic of several years with a projected growth at a Petabytes level.
Keepler and Cepsa have used the information stored in the Manufacturing Data Lake to implement different Data Products providing different business benefits. For example, using the stored information, Keepler was able to optimize the raw material needed in the manufacture of different chemical products.
The solution is completely based in the use of managed services, obtaining a serverless implementation easy to maintain, robust, secure and scalable. The main services used are:
-
AWS IoT as MQTT messaging central broker.
-
AWS Greengrass for the integration with on-premises sensors via MQTT and OPC-UA.
-
Amazon Kinesis to process information in Near Real Time.
-
Amazon S3 as storage main repository.
-
AWS Athena to consult Data Lake using SQL.
-
AWS Lambda and AWS Fargate to execute application logic.
-
AWS Glue as ETL tool and Data Catalogue.
-
AWS ElasticSearch as indexed data repository for time series.
-
Amazon DynamoDB as metadata storage.
-
AWS Database Migration Service for the migration and replication of on-premises databases.

Keepler is a boutique company of professional technology services specialized in design, construction, deployment and software solutions operations of Big Data and Machine Learning for big clients. They use Agile and Devops methodologies and native services of the public cloud to build sophisticated business applications focused in data and integrated with different sources in batch mode and real time. They have Advanced Consulting Partner level and have a technical workforce with 90% of their professionals certified in AWS. Keepler is currently working for big clients in different markets, such as financing services, industry, energy, telecommunications and media.