Multi-tenant data platform for IoT event historization and real-time analytics

Knolar is a business unit created by Cepsa (Compañía Española de Petróleos SA) in 2021 that responds to the new needs of Industry 4.0 in terms of monitoring facilities and making this data available to analysts and data scientists.

Knolar is already in use at Cepsa, allowing to know statistical data in real time and in a fast and visual way, thanks to artificial intelligence; and it is a solution to boost industrial data in sectors such as automotive, food, transport, chemical, utilities, smart cities

The smart Data Platform for the new industry 4.0 era

Knolar meets the data needs of your entire team. Thanks to its democratization, all levels of your business can exploit the tool’s potential.

Knolar can connect and integrate in a simple, secure, agile, scalable and low-cost way, the data emanating from the sensors of your industrial facilities, domotic zones or any type of IIoT/IoT device, allowing you to have absolute control of what is happening in real time in your plants, industrial facilities, or any asset in the field of operation technology.

In addition, with knolar you will be able to guarantee the convergence with the world of data coming from your IT systems, such as CRMs, ERPs, Applications or Datawarehouses, providing a global vision of your company.

The Solution and Main AWS Services Used

Knolar allows you to eliminate the complexity of IIOT-OT/IT source systems by enabling real-time, one-stop access to sophisticated, complex and unexplored data sources in order to generate real value for your business.

  1. The user experience had to isolate the user from what was under the hood, so that the user would not interact directly with AWS services, but would do so through a web portal with all the functionality for data ingestion, storage and consumption.
  2. The platform had to be multi-tenant so that each customer account would have its own set of users, with a predefined profile within the platform and their data would be isolated from the rest of the accounts.
  3. At the data ingestion level, different levels of latency were required, from real-time ingestion, making the data hot available for sub-second consumption, to batch file ingestion. These ingestions had to be contextualized, so that a person with no programming knowledge could use it.
  4. Perform data enrichment with other metadata.
  5. At the consumption level, there were different requirements to support:
    • Descriptive analytics: Filling the need for business users to consume data via Excel directly from the database or data lake and a standard ODBC interface to connect to a BI tool.
    • Advanced analytics: The platform was to be the centerpiece on which an ecosystem of predictive (predictive maintenance, demand forecasting…) or prescriptive (next-best action, forensic analysis of asset symptomatology…) analytics data products could be built.
  6. SaaS model, so the architecture should be standard for all customers without loss of flexibility and scalability based on the demand for data ingestion and consumption.
  7. As the platform evolves, canary-releases should be possible for certain environments and a gradual opening of functionalities to different clients.

To meet the needs of multi-tenancy of the platform and isolation of data environments and services for each client, we opted for the construction of a SaaS. For this Keepler relied on the AWS SaaS Factory best practices framework and the use of services such as Control Tower. Keepler deployed this service in a Siloed + Pooled solution.

The result is centralized governance of customer accounts deployed under the landing zone and automatic provisioning of new accounts at the click of a button. On the other hand, audit logs are centralized to be exploited at account and user level, allowing to establish transversal security audits.

Once the platform scaffolding was provisioned, the development of the client tenant architecture consisted of the following layers:

  • Storage and operation

    The SaaS platform’s customer typology is very heterogeneous, mainly industry and energy. The diversity of use cases that can be built on the platform makes it necessary to provision an architecture that contemplates different data warehouses, following a lake house philosophy.

    Keepler opted for the construction of such a data ecosystem, where the centerpiece is the data lake. The data lake allows a deeper data history to be stored and a subset of the data, usually the most recent, to be brought to a repository of the hottest data.

  • Data ingestion and processing

    Different information ingestion channels depending on the origin and typology of the data and latency requirements for its consumption. The main characteristic of the ingestion processes is that they were defined declaratively from the web portal, which allowed the data to be contextualized. Once the ingest was defined with the wizard, it was automatically provisioned at the same time to be used in a matter of minutes. The real-time channel was created for continuous unstructured data ingestion over time with sub-second latency for information consumption. Finally, for enrichment from the web portal, metadata ingestion was allowed.

  • DevOps portal for infrastructure provisioning

    Finally, to meet the requirements of a flexible and scalable platform, a DevOps portal was created to deploy the infrastructure templates and components (add-ons) needed to support the different cases of data exploitation: queries on the data lake, reporting with BI tools, contextual API. To do so, the CI/CD tool Github Actions was used to orchestrate the deployment and Cloudformation flows for the Infrastructure and Code templates.

  • Graphical user interface to interact with the platform

    Web portal hosted in the product area of the platform. This web portal provides the functionalities for users to interact with the platform without the need to directly access the underlying platform’s AWS resources and services: user administration, creation/modification/deletion of ingests, activation of data consumption modules, etc.

  • The AWS pay-per-use model allowed the client to define a pricing model tailored to each type of customer, based on expected volumes.

  • The solution was implemented entirely through managed services with a very low operating cost.

  • The architecture is flexible in the ingestion, allowing the ingestion of any type of data source. At the exploitation level, the platform integrates with the main BI tools.

  • The architecture is scalable, being able to grow vertically and horizontally based on new business requirements. The DevOps portal facilitates the deployment of new infrastructure tenants in minutes.

  • The cost of storing raw data in S3 is so low compared to traditional systems (going from a scale of millions to thousands of euros), that Knolar can store all the values emitted by all sensors without having to apply interpolation and approximation mechanisms.

  • Total Cost of Ownership (TCO) was reduced by 40% compared to other similar ingest and storage solutions within the company.

Keepler is a boutique company of professional technology services specialized in design, construction, deployment and software solutions operations of Big Data and Machine Learning for big clients. They use Agile and Devops methodologies and native services of the public cloud to build sophisticated business applications focused in data and integrated with different sources in batch mode and real time. They have Advanced Consulting Partner level and have a technical workforce with 90% of their professionals certified in AWS. Keepler is currently working for big clients in different markets, such as financing services, industry, energy, telecommunications and media.

Let’s talk!

If you want to know more or if you want us to develop a proposal for your specific use, contact us and we’ll talk.