Case Study | Cepsa Química: GenAI-based assistant for HSEQ Policies in Product Stewardship Enquiries

Renewable Energy Landscape: scene featuring a windmill and a sprawling solar farm.

Cepsa is a leading international company committed to sustainable mobility and energy with a solid technical experience after more than 90 years of activity. The company also has a world-leading chemicals business with increasingly sustainable operations.

Cepsa Química is a world leader in its sector and is leading the shift towards sustainable chemistry, with a clear commitment to the fight against climate change and the transition to a circular, non-fossil economy.

The company is a world leader in the production of LAB, the main raw material used in biodegradable detergents. It is also number one in the production of cumene, an intermediate product used in the production of phenol and acetone, which are the main raw materials for the manufacture of engineering plastics and of which it is the world’s second largest producer

Accompanying the customer’s data-driven strategy, Keepler is a Cepsa strategic partner in crucial cross areas such as the Cloud & DevOps Center of Excellence. Beyond this, Keepler collaborates with their Digital Transformation Hub (called DITEX) through talented and skilled individuals who compose cross-functional teams, specially crafted to co-create solutions with business areas using the data of the company, so called Data Products.

Currently, one of the main objectives of DITEX is to democratize the use of Artificial Intelligence within company business areas, so that this technology becomes another lever for generating value.

Transforming Regulatory Compliance Processes with GenAI

The HSEQ (Health Safety Environment & Quality) area of Cepsa Química is in charge of human health, safety and environmental aspects applied to the products produced by the company, as well as the raw materials involved.

Its areas of action are product safety, regulatory compliance, sustainability and customer service on issues related to products in the area of Safety and Compliance. This area is responsible for product stewardship.

In the area of regulatory compliance, the department has to consult a vast volume of information to check whether a regulation applies to a given product or to know which regulations apply to that product. The content may change over time, introducing new clauses and/or repealing others. At present, this task consumes a significant amount of the team’s time, so the aim is to reduce the search time involved in regulatory queries on existing information.

After an initial analysis of the pain points and the associated documentation, it seems feasible to approach a project using Generative AI techniques so that the department can ask questions about both unstructured (regulatory documents and product sheets) and structured (product catalog) information, and the system can answer those questions.

    Solution on AWS

    Amazon Web Services logo

    A data product team from Keepler co-created a fully-functional solution with the customer. The data product team worked with HSEQ users to understand the type of queries most commonly used and where the information could be found.

    After that, the team cleaned and processed the regulatory data in order to extract the information in the form of chunks. An indexation strategy was executed to appropriately index document embeddings for efficient searches and timely responses. A Retrieval Augmented Generation (RAG) Approach was carried out in order to enhance the capabilities of the LLM model with the required context from the regulatory data.

    The team crafted an LLM chain  pipeline to accommodate for  the business logic around the questions performed by users and answers to be provided by the system. Finally, prompt engineering techniques were used  to provide a fully-functional system, with increased performance and better responses.

    This solution, complemented by a user-friendly conversational interface, streamlined regulatory compliance processes, markedly reducing query times and enhancing operational efficiency and risk mitigation.

    The solution was deployed using cloud native AWS services. Below are the functional blocks that make up the solution and the AWS services and technologies involved during the development of the use case:

    AWS services used were as follows

    5

    Document processing: Regulatory documents & product catalog were stored in Amazon S3. The document processing effort was shortened using Amazon Textract. Subsequent cleanup and post-processing was performed with AWS Glue jobs.

    5

    Preparation and storage of embeddings: Embeddings were extracted from the documents using AWS Bedrock with the Titan text embeddings model. Through AWS Glue jobs, the embeddings generated were stored in an Amazon Aurora PostgreSQL with the pgvector extension. This allows to store machine learning (ML) model embeddings in the database and perform efficient similarity searches. 

    5

    Q&A conversational interface: A front end for user queries was developed consisting of a static web stored in Amazon S3 and served using AWS Cloudfront, with authentication ensured by means of AWS Cognito. The queries from the front end to the system were made via API through Amazon API Gateway. The response is returned by the same service.

    5

    LLM Chain Service (LLM Chain Service): The LLM Chain Service, that is called via API from the front-end, orchestrates the interaction with the different LLMs and performs its execution in a AWS Fargate cluster. Based on the type of query performed by the user, the chain of calls to the LLMs may vary, making requests to Amazon Bedrock. Claude version 2 was used as the main LLM model.

    Benefits for the client

    Significant Reduction in Query Time

    The average response time is reduced from 30 minutes to seconds for complex questions.

    Cost Savings

    The reduction in query time results in an estimated average saving of 25% of time performing information queries.

    Enhanced Response Quality

    The answers provided by AI often add additional context. Furthermore, in inexperienced users it can avoid interpretation errors, especially in complex questions.

    Increased Regulatory Compliance

    The solution allowed for more queries with the same staff, boosting compliance rates and reducing the risk of fines.

    Improved Operational Efficiency

    The system accelerated the regulatory query process, directly enhancing operational efficiency.

    Risk Mitigation

    By increasing regulatory compliance, the solution effectively reduced the potential for fines due to non-compliance.

    Keepler Data Driven Partner Logo

    Keepler is a full-stack analytics services company specialized in the design, construction, deployment and operation of advanced public cloud analytics custom-made solutions. We bring to the market the Data Product concept, which is a fully automated, public cloud services-based, tailored software that adds advanced analytics, data engineering, massive data processing, and monitoring features. In addition, we help our customers transition to using public cloud services securely and improve data governance to make the organization more data-centric.

    Would you like to talk about your business?

    We can help you leverage the power of data to enhance your operations.

    Privacy Overview

    This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.