Designing and Implementing an ML Pipeline to Predict ETA for Bus Routes



Machine learning is a rapidly growing field that is revolutionizing industries across the world. One of the most promising applications of machine learning is transportation, where it can be used to improve the efficiency, safety, and reliability of public transportation systems. This case study describes how an Indian bus transport technology company partnered with Applied Cloud Computing (ACC), to improve its services by accurately predicting the Estimated Time of Arrival (ETA) for each bus route using machine learning models. 


About our client:  

Our client is India’s #1 bus transport technology company that provides live bus tracking services and contactless payment solutions to transform everyday bus travel into a safer and more reliable experience. Transport is a fundamental need – like water or air. Yet only around 15% of Indians can afford their own car, private taxi or a two-wheeler. The remaining 85% depend on public transport. Buses, as a mode of transport, form the core architecture of any city’s transport system. They also offer the largest improvement areas and the largest opportunity to create an impact, with 2 out of 3 public transport users depending on buses for their travel. Wider roads are not an option, people need to use public transport to reduce traffic density and pollution. Their core purpose is to make travel better for everyone, and believe that our cities, our health and our lives will be better when we improve the way we travel  


Problem Statement: 

Client is a public transport provider that operates buses on multiple routes across a city. The company wanted to improve its services by accurately predicting the ETA for each bus route.  It is crucial for them to accurately predict the ETA for bus routes to minimize delays and improve the overall efficiency and customer satisfaction of public transportation. The client wanted to implement machine Learning models to utilize historic data for predicting the ETA time of buses between stops and improve its customer service and operations. 


Why ACC? 

The client needed expertise to successfully build and implement the Machine Learning Modernization system capable of accurate predictions. As ML Modernization  is a niche segment, the client team preferred consulting existing cloud companies over hiring personnel to complete the task. 

ACC offers ML  Modernization  service as a comprehensive solution designed to help businesses build, manage, and deploy machine learning models with ease. ACC is an advanced AWS Consulting Partner with capabilities in consulting, digital, cloud and operations. ACC has 200+ engineers working across three business units.  


Solution | Architecture & Services: 

ACC’s Solution was to design and implement an ML Modernization pipeline that takes data from sources like Amazon S3 and deploys the best-performing model to predict the ETA for bus routes. It involves using AWS services like Glue and Amazon Sage Maker algorithm to build and deploy the ML Modernization pipeline.  

ACC configured the data and set up AWS Glue to discover, categorize, clean, and enrich the data. Extracted data was transformed into a format that the ML Modernization models can use. Once the data was cleaned and preprocessed, ACC used Amazon Sage Maker to build and train an ML Modernization model using the cleaned and enriched data. Amazon Sage Maker notebooks were used to perform EDA and feature engineering on the processed data. Different models were used, and the best was chosen based on the performance criteria given. Hyper-parameter tuning was implemented to find out the best parameters.  

Once the best-performing model is selected and tuned, the company deployed it using Amazon Sage Maker endpoints to make predictions on new data in real time. They also set up a pipeline that can automatically take new data, preprocess it, and run it through the ML Modernization model to generate predictions. This ensures that the ETA predictions are always up-to-date and accurate. 



ACC’s expertise in designing and implementing ML Modernization pipelines enabled the client to improve its services by accurately predicting the Estimated Time of Arrival (ETA) for each bus route. This leads to increased customer satisfaction and better operational efficiency for the company. 


Future Plan 

ACC will continue to monitor the ML Modernization pipeline to ensure that it’s working as expected. This includes tracking the model’s performance over time, identifying any issues, and making updates as needed. This ensures that the ML Modernization  pipeline remains accurate and up to date over time. 


AWS Services used: 

  1.  Amazon Glue is a server-Less data integration service that makes it simpler to find, prepare, move, and combine data from many sources for analytics, machine learning (ML Modernization ), and application development.
  1. AWS Sage Maker – Sage Maker enables developers to create, train, and deploy machine-learning models in the cloud. Sage Maker also enables developers to deploy ML Modernization models on embedded systems and edge-devices.
  1. Amazon S3 – A web service interface-based object storage service offered by Amazon Web Services is called Amazon S3, also known as Amazon Simple Storage Service (AWS)