Case Study | Cepsa: Data Lake of Manufacturing and Cloud Platform Deployment

Cepsa is a consolidated multinational company in the sector, with years of experience and a team of more than 11,000 professionals across the five continents in which it operates, integrating all phases of the energy value chain.
Cepsa aspires to become a benchmark for sustainable mobility, biofuels and green hydrogen in Spain and Portugal, as well as a key company in the Energy Transition, putting customers at the center of its activity and helping them in their efforts towards decarbonization.
Energy and Industrial Sectors are in the Mids of Change.
This change goes hand in hand with a new industrial revolution called Industry 4.0 where the use of data has a particular relevance.
Current systems of control and events historification have proved to be significantly limited when enabling to integrate and analyse information together with data external to their own plants. Besides, those systems have closed licencing models that penalizes the client when integrating external information, such as Lab Data, weather information, costs and prices information…
Solution on AWS
IoT standardization protocols enable to use current platforms of plants control but adding limitless historification functionality, cheap and with large capacities to integrate external data and to perform sophisticated analysis on them.
With this solution Cepsa is seeking to build a Data Lake in the cloud that centralizes the information coming from hundreds of thousands of sensors installed in their manufacture plants, that integrates additional sources to enrich this information and that enables to exploit the data using advanced analytics processes, visualization and Business Intelligence tools.
Data Lake is capable of intaking, processing and making available to platform users an average of two thousand signals per second rapidly in a Near-Real Time model, as well as persisting the information in a historic of several years with a projected growth at a Petabytes level.
Keepler and Cepsa have used the information stored in the Manufacturing Data Lake to implement different Data Products providing different business benefits.
For example, using the stored information, Keepler was able to optimize the raw material needed in the manufacture of different chemical products.
Additionally, Keepler supported Cepsa’s Security, Networking, and Operations teams in deploying a federated landing zone with their main identity provider. They created automations for the account vending machine and account platforming, developed different procedures and automations to integrate the Security suites for SIEM and threat detection, and supported the Networking team in deploying the transit gateway with an inspection VPC architecture and firewall. Keepler provided a framework for developers to securely create and manage their own IAM resources using Service Control Policies and Boundaries. They also deployed ad-hoc tools for managing and monitoring the costs of each AWS account.
The solution is completely based in the use of managed services, obtaining a serverless implementation easy to maintain, robust, secure and scalable.
AWS services used were as follows
AWS IoT as MQTT messaging central broker.
AWS Greengrass for the integration with on-premises sensors via MQTT and OPC-UA.
Amazon Kinesis to process information in Near Real Time.
Amazon S3 as storage main repository.
AWS Athena to consult Data Lake using SQL.
AWS Lambda and AWS Fargate to execute application logic.
AWS Glue as ETL tool and Data Catalogue.
AWS ElasticSearch as indexed data repository for time series.
Amazon DynamoDB as metadata storage.
AWS Database Migration Service for the migration and replication of on-premises databases.
AWS Control Tower for account vending machine and account platforming.
AWS Identity Center to centralize the management identity of developers who access the platform.
AWS Config to monitor changes to resources.
AWS Cloudtrail to create a durable audit log for AWS API actions.
AWS ACM to manage the certificates of load balancers.
AWS KMS to manage encryption keys of the data in S3, DynamoDB and RDS.
Benefits for the client
Pay Per Use Model
The pay per use model of the public cloud has enabled Cepsa to have a solution without major initial investments and a low experimentation cost.
Operational Cost Reduction
Given that it is a solution fully implemented by managed services, the operational cost is reduced.
Horizontal Scaling for Sensor Integration
All the pieces of the solution scale horizontally, and so the integration of more sensors does not lead to a bottleneck in the platform.
Flexible Integration
It is an open system that allows integrating any tool of information exploitation that may be deployed on AWS.
Robust Services
The system works with services such as S3 or DynamoDB, which provide high availability and a great solidity by default.
Complete Sensor Data Retention
The cost of information storage in S3 en bruto is so low compared to traditional systems (changing from a scale of millions to one of thousands of euros) that Cepsa is able to storage all the values issued by all the sensors without having to apply mechanisms of values interpolation and approach.
Secure Cloud Data Platform
A secure cloud data platform that adheres to all security policies defined by the Cybersecurity and Networking team that enables self-service to developers and business units.
Cloud Governance
Automations and integrations with on-prem systems to streamline the daily operation of the federated identity, security, networking, monitoring, and costs management.
Keepler is a full-stack analytics services company specialized in the design, construction, deployment and operation of advanced public cloud analytics custom-made solutions. We bring to the market the Data Product concept, which is a fully automated, public cloud services-based, tailored software that adds advanced analytics, data engineering, massive data processing, and monitoring features. In addition, we help our customers transition to using public cloud services securely and improve data governance to make the organization more data-centric.
Would you like to talk about your business?
We can help you leverage the power of data to enhance your operations.
