Industry 4.0 has been adopted by the transport sector and in particular the railways and is characterised by the use of intelligent systems and industrial internet-based solutions.  The use of new technologies has improved the quality of services and business models, based on the analytical capabilities of big data and its potential to transform current platforms into a network of collaborative communities that transport goods and passengers.  The current trend for automation and data exchange has led to the use of new and emerging technologies to reach greater levels of efficiency and effectiveness.

CAF, AWS and Keepler

CAF, a multinational group with more than 100 years’ experience providing integral transport systems in the rail industry, has begun the process of becoming a more digital and data-driven company and so must make changes to its IT processes.

With this in mind, CAF launched a “Digital Train” initiative several years ago, which resulted in the creation of the LeadMind platform.

LeadMind supplies a new generation of connected trains and provides more competitive services for railway operators and maintenance through data gathering, storage, processing and advanced analysis to support decision-making in real-time and advance towards prediction-based/condition based maintenance.

LeadMind characteristics:

1. Provides a modular, open and scalable product that may be personalised according to client needs.
2. Offers information in a user-friendly format and a powerful tool for facilitating decision-making processes.
3. Increases efficiency of operations and maintenance (reduces CCL, improves fleet CCL and root cause analysis, reduces repeat errors)
4. Eliminates black boxes by merging all data from the railway ecosystem.
5. Complies with modern standards of cyber security.

In opting for LeadMind as an open, modular, flexible, multi-provider and customisable platform, CAF implements and deploys LeadMind’s analytics functionality in Amazon Web Services (AWS), customised for the needs of a specific project by means of a complete upgrade of monitoring systems for a train fleet.

The objective of this technological challenge is to improve data processing time and to increase analysis speed and efficiency. The aim is to bring IT architecture to AWS’s public cloud with the help of Keepler Data Tech as technology integrator in order to impact on two key areas of data exploitation: firstly to offer better data description and categorisation to Business Intelligence analysts; and secondly, to enable data scientists to create more effective predictive maintenance models.


The design of the LeadMind function on AWS suggested by Keepler is an integral solution that receives data from trains and processes this information so that it may be be correctly stored in a Data Lake, including scaling to an unlimited number of vehicles in the future, with daily processing and intake every five minutes on the Big Data platform.

Once stored in the Data Lake, the data are exploited in three ways:

  • Low complexity query execution for exploratory data analysis purposes.
  • Very efficient high temperature data visualisation storage in TIBCO Spotfire.
  • Use of a set of Jupyter Notebooks for accessing data in Data Lake as well as in AWS Redshift, enabling CAF to develop and test new predictive maintenance models for trains.

Alarm data are processed and stored in a Data Lake in near-real time, with a maximum lag of five minutes and alerts are sent via SMS/email to a set of subscribers.  The platform itself sends notifications via email to specific subscribers in case there is a partial or complete failure of the ETL procedure when processing each source file.

Combination of the following AWS services results in a more agile platform for data management that is also scalable for future needs:

  • AWS S3 as a main storage repository.
  • AWS Athena for consulting the Data Lake via SQL.
  • AWS Glue as ETL tool and data catalogue.
  • AWS EC2 for BI services with TIBCO Spotfire.
  • AWS Glacier as backup for older files.
  • AWS SageMaker to launch Notebooks iPython, used by CAF data scientists to develop new models.
  • AWS Redshift loaded automatically with a subset of data from source data, to optimise Business Intelligence processes.
  • Amazon DynamoDB as metadata storage.
  • AWS RDS (with MySQL) for storing master data that allows transformation of fields.
  • AWS Batch for FTP synchronisation.
  • AWS Lambda for running detection logic in ETL and near-real time alarms.
  • AWS SNS and AWS SES for processing errors and notifications in near real-time.


The pay-per-use model for the cloud and implementation via managed services allows CAF to deploy a solution at considerably lower investment costs. Storage and processing costs are also lower; for example, processing the entire data log gives a time saving of more than 90% compared to the previous on-premise solutions. 


All the pieces of the solution scale horizontally and thus the inclusion of more sensors or increasing the train fleet does not result in a bottleneck and allows agile and automatic scaling.  Furthermore, this is an open system enabling integration of any data exploitation tool deployable on AWS.

Image: unsplash | dan roizer

Consult details of this use case in  CAF | Big Data use case


  • Keepler

    Software company specialized in the design, construction and operation of digital data products based on cloud computing platforms.