Case Study | IoT platform for real-time data acquisition

Renewable Energy Landscape: scene featuring a windmill and a sprawling solar farm.



Our client is a leading international company committed to sustainable mobility and energy producing petrochemicals. Continuous advancement and the use of technology has enabled them to be an EMEA market leader with global reach.

The agenda for 2030 involves an ambitious strategy. They want to become the leader in sustainable mobility, biofuels, and green hydrogen in Spain and Portugal and a benchmark in the Energy Transition.

For this purpose, the company invests up to EUR 8000 million into new projects, 60% of which are sustainable projects. By 2030 (benchmarking against 2019), the company wants to reduce emission 55%, reduce carbon intensity of products sold by 15-20% and increase the contribution of sustainable EBITDA. Further, by 2025, the company targets a reduction in freshwater withdrawal in areas of water stress by 20%. By 2050, the company will be Carbon Neutral.

Empowering sustainable petrochemical operations through enhanced IT infrastructure and real-time data management

Our customer relies on accurate information and real time data. The customer had connected several industrial devices and OPC UA industrial servers to a proprietary on-premise IoT platform that did not provide the necessary insights. On the contrary, their initial in-house solution proved problematic with regards to:

1. Data: Their data ended up in a black box for the customer who was not able to analyse or even monitor the entire IoT data life cycle end to end. The solution was outsourced, not cloud-native and hence not flexible, agile, scalable or adaptable to the 5 Vs of Data: velocity, volume, value, variety and veracity and hence could not be managed or fine-tuned to signal types and required frequencies. Further, the solution was not resilient due to a lack of availability. External data, such as industrial data for analysis and comparison, was impossible to ingest, reducing opportunities for innovation based on comparison and correlation.

2. The customer lacked in-house capabilities to operate & maintain the solution, but also did not want to continue outsourcing everything. A key objective was to ensure that long-run capabilities were built within the team.

3. The existing solution was expensive to license, and its value was therefore even less realised.

Solution on AWS

Keepler asked to create a solution that prioritized the generation of business value for our customer, identifying the key metrics and KPIs needed, localizing the data and information, which allowed the organization to have the ability to self-manage the entire IoT lifecycle, from on-site data capture to advanced analytics.

The solution was based on native AWS cloud services on a pay-as-you-go basis, which improved the licensing cost of the connected plant devices. The solution added an additional layer of security, increased performance, resiliency, and reduced total cost of ownership (TCO) for our client.

Keepler designed and deployed an end-to-end AI&ML-ready IoT data management solution across 30+ refineries worldwide with 3 primary modules, enabling complete control of the IoT data lifecycle:

1) The ‘Edge Module’ is based on Greengrass v2 and resides in the DMZ’s physical servers of each monitored site. It handles data collection and processing via OPC UA and MODBUS. Keepler developed the processing modules with the following requirements:

a) Interface between Greengrass and Kinesis to fulfill failover time needs;

b) Elimination of data gaps by backing up to the OPC UA client;

c) Retrieval of historical data using OPC UA functionalities, processing the historical data serverless in the cloud for easy process scaling.

d) Extraction of SQL database from OPC UA to access and match metadata to extraction from industrial servers and compare the constantly changing tags used for analytics.

2) The ‘Data Processing Module’ with real time and batch channels receives information from the ‘Edge Module’ and is the data source for the ‘Knolar Analytics Tool’. Signals are stored in a temporary data buffer whose retention can be parameterized. The inherent process automatically detects data gaps and addresses these via an automatic history retrieval process to prevent incomplete data from being sent to the analytical system.

3) The ‘Knolar Analytics Module’ is an application designed and developed by Keepler using SaaS Factory to serve multiple users for both time series and time series analytics. The interface connects multiple data sources, to ingest, store and consume the data. See more.

At consumer level, the requirements were:

a) Descriptive analytics: Both consuming data via Excel (for business users) directly from the database and data lake, and the standard ODBC interface connected to a BI tool had to be covered.

b) Advanced analytics: The platform became the centerpiece for an ecosystem of predictive and prescriptive analytics data products. Using the SaaS approach to achieve maximum flexibility and scalability for the data ingestion & consumption, a hot and versatile data warehouse supporting time series, structured and unstructured data use cases was built.

AWS services used were as follows

5

Amazon Greengrass v2 to manage computing processes on the edge.

5

AWS IoT Sitewise edge to monitor the edge devices.

5

AWS IoT Core to ingest MQTT signals.

5

Kinesis data streams to direct data to the consumer but also to ingest data in a buffer DataLake for temporary storage.

5

AWS Lambda to perform different processing, ETL processes of data partitioning and enrichment.

5

DynamoDB to store metadata.

5

Amazon Kinesis Data Firehose to establish a streaming circuit to the analytics data lake, allowing data buffering without penalizing storage in Amazon S3.

5

Amazon SNS to notify the data processing processes as the message partitioning was completed, in order to decouple both processes.

5

Amazon DynamoDB as a key-value database for IoT event alarm processing.

5

Amazon DynamoDB Accelerator (DAX) as a low-latency cache to serve the information needed to perform data enrichment.

5

Amazon Managed Streaming for Apache Kafka (MSK) as a managed and highly available data streaming service that allows to store the information for a configurable period of time and to subscribe to its topics for different consumers to receive the data. Specifically, we connect Apache Druid to one of its topics to receive the information with low latency.

5

Batch channel was provisioned with a two-pronged target. On the one hand, it allowed loading a history of years of sensor data at a low cost without resorting to a real-time architecture and, on the other hand, it allowed the ingestion of structured data from a format previously defined in the web portal. The following services were used for this purpose.

5

AWS Transfer for SFTP as a fully managed service for transferring files via SFTP to Amazon S3.

5

Amazon S3 as the backbone service for building the data lake in its different layers: landing, staging, business. The landing layer stored the uploaded files for further processing.

5

Amazon SQS as a queue manager for batch file processing tasks.

5

AWS Transfer for SFTP as a fully managed service for transferring files via SFTP to Amazon S3.

5

Amazon S3 for storage of metadata files and object data.

5

AWS Fargate as a managed compute cluster to run the processing jobs.

5

Amazon ECR as an image registry for Fargate containers.

5

Amazon SNS for notification to users via email of successful metadata ingestion upload.

Benefits for the client



Qualitative results

Data can be analyzed in 5 second to 1 minute intervals. This enables total time savings of 100% for the recovery of historical data with full transparency and automation. Previously, the process was manual, error prone and thus costly because of the outdated OPCUA servers.
Incident detection time reduced by 80% as with IoT Data Lifecycle is now included in the client’s monitoring system.
Using AI&ML, applications for energy and raw material savings and reduction of pollution were implemented.



Quantitative results

2 days on average (month/plant) dedicated to troubleshooting around IoT data reduced to zero.
2 days on average (month/plant) to resolve gap filling problems and reprocess data
3 business days on average (month/plant) lost due to lack of data.
10 days on average to start-up a new site.



Average cost savings

Cost savings of more than 65% between infrastructure costs, licensing and maintenance of the solution.
Average cost savings per plant: 13,000 EUR/month, with a potential of 30 plants.

Back to Client Stories

Keepler is a full-stack analytics services company specialized in the design, construction, deployment and operation of advanced public cloud analytics custom-made solutions. We bring to the market the Data Product concept, which is a fully automated, public cloud services-based, tailored software that adds advanced analytics, data engineering, massive data processing, and monitoring features. In addition, we help our customers transition to using public cloud services securely and improve data governance to make the organization more data-centric.

Would you like to talk about your business?

We can help you leverage the power of data to enhance your operations.