IoT platform for real-time data acquisition

Our client is a leading international company committed to sustainable mobility and energy producing petrochemicals. Continuous advancement and the use of technology has enabled them to be an EMEA market leader with global reach.  

The agenda for 2030 involves an ambitious strategy. They want to become the leader in sustainable mobility, biofuels, and green hydrogen in Spain and Portugal and a benchmark in the Energy Transition. 

For this purpose, the company invests up to EUR 8000 million into new projects, 60% of which are sustainable projects. By 2030 (benchmarking against 2019), the company wants to reduce emission 55%, reduce carbon intensity of products sold by 15-20% and increase the contribution of sustainable EBITDA. Further, by 2025, the company targets a reduction in freshwater withdrawal in areas of water stress by 20%. By 2050, the company will be Carbon Neutral.

Empowering sustainable petrochemical operations through enhanced IT infrastructure and real-time data management
The company currently operates on 30+ petrochemical production sites and refineries, for which it is required to manage the IT Infrastructure, real time reporting and thus operation of these plants. Data enables our customer to optimise production by reducing the consumption of raw material, saving man-hours and becoming more sustainable.

Our customer relies on accurate information and real time data. The customer had connected several industrial devices and OPC UA industrial servers to a proprietary on-premise IoT platform that did not provide the necessary insights. On the contrary, their initial in-house solution proved problematic with regards to:

1. Data: Their data ended up in a black box for the customer who was not able to analyse or even monitor the entire IoT data life cycle end to end. The solution was outsourced, not cloud-native and hence not flexible, agile, scalable or adaptable to the 5 Vs of Data: velocity, volume, value, variety and veracity and hence could not be managed or fine-tuned to signal types and required frequencies. Further, the solution was not resilient due to a lack of availability. External data, such as industrial data for analysis and comparison, was impossible to ingest, reducing opportunities for innovation based on comparison and correlation.

2. The customer lacked in-house capabilities to operate & maintain the solution, but also did not want to continue outsourcing everything. A key objective was to ensure that long-run capabilities were built within the team.

3. The existing solution was expensive to license, and its value was therefore even less realised.

The Solution and Main AWS Services Used

Keepler asked to create a solution that prioritized the generation of business value for our customer, identifying the key metrics and KPIs needed, localizing the data and information, which allowed the organization to have the ability to self-manage the entire IoT lifecycle, from on-site data capture to advanced analytics.

The solution was based on native AWS cloud services on a pay-as-you-go basis, which improved the licensing cost of the connected plant devices. The solution added an additional layer of security, increased performance, resiliency, and reduced total cost of ownership (TCO) for our client.

Keepler designed and deployed an end-to-end AI&ML-ready IoT data management solution across 30+ refineries worldwide with 3 primary modules, enabling complete control of the IoT data lifecycle:

1) The ‘Edge Module’ is based on Greengrass v2 and resides in the DMZ’s physical servers of each monitored site. It handles data collection and processing via OPC UA and MODBUS. Keepler developed the processing modules with the following requirements:

a) Interface between Greengrass and Kinesis to fulfill failover time needs;

b) Elimination of data gaps by backing up to the OPC UA client;

c) Retrieval of historical data using OPC UA functionalities, processing the historical data serverless in the cloud for easy process scaling.

d) Extraction of SQL database from OPC UA to access and match metadata to extraction from industrial servers and compare the constantly changing tags used for analytics.

2) The ‘Data Processing Module’ with real time and batch channels receives information from the ‘Edge Module’ and is the data source for the ‘Knolar Analytics Tool’. Signals are stored in a temporary data buffer whose retention can be parameterized. The inherent process automatically detects data gaps and addresses these via an automatic history retrieval process to prevent incomplete data from being sent to the analytical system.

3) The ‘Knolar Analytics Module’ is an application designed and developed by Keepler using SaaS Factory to serve multiple users for both time series and time series analytics. The interface connects multiple data sources, to ingest, store and consume the data. See more

At consumer level, the requirements were: 

a) Descriptive analytics: Both consuming data via Excel (for business users) directly from the database and data lake, and the standard ODBC interface connected to a BI tool had to be covered.

b) Advanced analytics: The platform became the centerpiece for an ecosystem of predictive and prescriptive analytics data products. Using the SaaS approach to achieve maximum flexibility and scalability for the data ingestion & consumption, a hot and versatile data warehouse supporting time series, structured and unstructured data use cases was built.

Main AWS Services used:

  • Amazon Greengrass v2 to manage computing processes on the edge.
  • AWS IoT Sitewise edge to monitor the edge devices
  • AWS IoT Core to ingest MQTT signals
  • Kinesys data streams to direct data to the consumer but also to ingest data in a buffer DataLake for temporary storage.
  • AWS Lambda to perform different processing, ETL processes of data partitioning and enrichment.
  • DynamoDB to store metadata.
  • Amazon Kinesis Data Firehose to establish a streaming circuit to the analytics data lake, allowing data buffering without penalizing storage in Amazon S3.
  • Amazon SNS to notify the data processing processes as the message partitioning was completed, in order to decouple both processes.
  • Amazon DynamoDB as a key-value database for IoT event alarm processing.
  • Amazon DynamoDB Accelerator (DAX) as a low-latency cache to serve the information needed to perform data enrichment.
  • Amazon Managed Streaming for Apache Kafka (MSK) as a managed and highly available data streaming service that allows to store the information for a configurable period of time and to subscribe to its topics for different consumers to receive the data. Specifically, we connect Apache Druid to one of its topics to receive the information with low latency.
  • Batch channel was provisioned with a two-pronged target. On the one hand, it allowed loading a history of years of sensor data at a low cost without resorting to a real-time architecture and, on the other hand, it allowed the ingestion of structured data from a format previously defined in the web portal. The following services were used for this purpose.
  • AWS Transfer for SFTP as a fully managed service for transferring files via SFTP to Amazon S3.
  • Amazon S3 as the backbone service for building the data lake in its different layers: landing, staging, business. The landing layer stored the uploaded files for further processing.
  • Amazon SQS as a queue manager for batch file processing tasks.
  • AWS Transfer for SFTP as a fully managed service for transferring files via SFTP to Amazon S3.
  • Amazon S3 for storage of metadata files and object data.
  • AWS Fargate as a managed compute cluster to run the processing jobs.
  • Amazon ECR as an image registry for Fargate containers.
  • Amazon SNS for notification to users via email of successful metadata ingestion upload.


Qualitative results

  • Data can be analyzed in 5 second to 1 minute intervals. This enables total time savings of 100% for the recovery of historical data with full transparency and automation. Previously, the process was manual, error prone and thus costly because of the outdated OPCUA servers.

  • Incident detection time reduced by 80% as with IoT Data Lifecycle is now included in the client’s monitoring system.

  • Using AI&ML, applications for energy and raw material savings and reduction of pollution were implemented.

Quantitative results

  • 2 days on average (month/plant) dedicated to troubleshooting around IoT data reduced to zero.

  • 2 days on average (month/plant) to resolve gap filling problems and reprocess data

  • 3 business days on average (month/plant) lost due to lack of data.

  • 10 days on average to start-up a new site.

Average saving costs

  • Cost savings of more than 65% between infrastructure costs, licensing and maintenance of the solution.

  • Average cost savings per plant: 13,000 EUR/month, with a potential of 30 plants.

Keepler is a boutique company of professional technology services specialized in design, construction, deployment and software solutions operations of Big Data and Machine Learning for big clients. They use Agile and Devops methodologies and native services of the public cloud to build sophisticated business applications focused in data and integrated with different sources in batch mode and real time. They have Advanced Consulting Partner level and have a technical workforce with 90% of their professionals certified in AWS. Keepler is currently working for big clients in different markets, such as financing services, industry, energy, telecommunications and media.

Let’s talk!

If you want to know more or if you want us to develop a proposal for your specific use, contact us and we’ll talk.