Predict fuel consumption of airplanes: Winning machine learning model


A few weeks ago I participated in my first ever machine learning competition. It was a contest initiated by Honeywell, hosted on CrowdAnalytix. I used a data driven approach to predict fuel consumption in aircrafts during different phases of flight, based on flight data recorder (FDR) archives. I’m happy to say I took 1st place!  Below is an overview of my approach.

The objective was two-fold:

  • Predict fuel flow for every second during different flight phases. The more fuel an airplane contains, the heavier it is. A heavier airplane will have higher fuel consumption. So I’m assuming the predictive model can then be used to predict fuel consumption during future flights, which can help reducing the amount of fuel entered in the fuel tanks before the actual flight.
  • Use the model to extract actionable insights and provide these in a report.

The uncompressed data-set was about 14.4 GB of flight data recordings from 1005 different flights.  Measurements were done every second, and every measurement contained 224 parameters.  Flight data contained measurements during preflight, taxi, take-off, climb, cruise, descent and roll-out.  There was also some data belonging to unknown flight-phases.


Understanding the data

I started out by doing a quick crash course on aircraft flight parameters, using lots of Wikipedia and an article explaining flight controls of a Boeing 737 (also check the other articles on the right).

Once I got some basic understanding on it, I categorized the 224 parameters in different categories.  This helped me get better understanding and intuition:

  • Time: Year, month, day, hour, minute, second
  • Location: Latitude and longitude
  • Environment: Static air temperature, total air temperature, wind speed & direction, …
  • Flight controls: Air-brake position, elevator position, spoiler, rudder, aileron, …
  • Flight controls set by pilot: Control column position, rudder pedal position, …
  • Position: Pressure altitude, altitude rate, angle of attack, corrected angle of attack, indicated angle of attack, baro correct altitude, drift angle, ground speed, lateral acceleration, longitudinal acceleration, …

I then also did some exploratory analysis by using visualization.  This showed that there were several parameters that were very correlated (0.99 and more) during certain flight phases, but not during other flight phases.

The visual inspection also proved very useful to get a feel for which parameters to use to filter for certain speeds or heights, as the data-set contained several parameters for each (e.g. for altitude there was pressure altitude, baro correct altitude, radio altitude, …)

Basically I did all of the above to understand airplane flight controls and environment measurements.  Now it was time to get coding… 🙂


Time to predict

I first split the training data in several parts, one for each flight phase. I ran through below steps for each flight phase.

As I found some very correlating variables (see above), I made sure to first remove unnecessary and noisy variables that could potentially confuse the predictive model.

There was also data in the training set where fire was detected.  In those cases I simply removed the feature AND the observations, if it wasn’t too much data.

I did quite a lot of manual feature engineering and I have no doubt this is the strength of my approach. In total I added about 40 new features.  Most new features were lags.
As an example, I noticed that there was barely any correlation between throttle target and fuel consumption. That doesn’t make any sense, does it? The throttle of the engines SHOULD impact fuel flow. So I noticed that when I took an X seconds lag then suddenly the correlation was quite obvious! This makes sense. If the pilot changes the throttle target then it will take some time for the engines to adapt to it.
This insight proved valuable for several of the pilot’s settings in the cockpit.  When he/she changes a setting, it usually takes some time before the airplane responds.  Making sure that the machine learning model had the possibility to ‘learn’ this insight greatly increased the predictive power of the model

I chose XGBoost as predicting model and simply went along with its standard settings for training as unfortunately I ran out of time to tune the hyper-parameters. No doubt some tuning could make it predict even better. Time was really an issue for me: I wasn’t able to train my latest and best model on all data (notably my final submission for the cruising phase was based on an older model).


Final thoughts

Personally I was really missing an ‘actual engine throttle’ feature. I think this extra feature could make a big contribution to an even better predictive model. However, perhaps predicting would be way too easy with this feature, so I’m thinking maybe Honeywell decided to remove this parameter from their data-set in order to let participants actively search for other, less visible relationships.

In the end I still think I under-utilized the aircraft specific knowledge I obtained. And there was still so much data that I left unused and could have implemented.  It was only close to the deadline that I realized that different airports have different altitudes and that this affects air pressure, which can have a considerable impact on fuel used during the take-off and climb.  I could have normalized the fuel consumption for different airports to make it easier for the model to predict.  But hey, we were trying to do in a week or 2 what some aircraft engineers probably do full-time! 🙂


Update 2017/03/16:
Over the past few months I’ve had several questions on the above via email.  First of all, I’m totally okay with this.  That said, if you have a question, then could you please ask me in the comments below?  That way all other readers get to see the answer as well.  I would appreciate it :).  I have now also extended the above article and included things I had answered to people privately.

26 thoughts on “Predict fuel consumption of airplanes: Winning machine learning model

  1. Luc Antheunissens Reply

    Congratulations Stijn. Interesting the way you managed to fill some crucial gaps by pragmatic and creative thinking.
    I’m convinced predictive data analysis will go through an astonishing (r)evolution the years to come
    Wish you all success in this endeavour !
    Message to all organisations potentially interested in this future data science; don’ hesitate to contact this young entrepreneur ! He sure will bring you to higher grounds with your data insights.

  2. Rahul Vishwalarma Reply

    Hello Sir,
    My name is Rahul Vishwakarma. I am post graduation student in India. Sir currently I am working on same project. But I used GPR model to predict the fuel flow rate. sir i am faced problem in understanding the dataset. Please suggest me .
    thank you

    • Stijn Tilborghs Post authorReply

      Hi Rahul. You’ll have to be a little more specific than that. What exactly is it that you don’t understand about the dataset?

      I agree the dataset is quite a tough one to crack. You’ll have to do a few hours of googl’ing the names of different parameters and teach yourself the various pilot instruments and how height and speed can be measured on an airplane (there are different methods and several methods are in the dataset).

  3. Rahul Vishwalarma Reply

    Thank you so much sir for your valuable suggestion.
    sir suggest me some article and paper which are helpful in complete my project. I am fresher in machine learning, that why facing more problem. If you have any manual for FDR dataset please send me.

  4. Ben Reply

    Hi Stijn, I’m a undergraduate student trying to complete a project to design a predictive controller which can control an aircraft when taxiing as well as optimising fuel consumption during this phase of flight.

    I came across the Honeywell competition you won when searching for FDR data to derive my aircraft taxiing model. I am having some trouble understanding what each of the acronyms refer to in the data. Within the description on the CrowdAnalytix webpage it mentions a ‘data dictionary’, which I was hoping may describe each of these acronyms but I can’t find it on the CrowdAnalytix website. I’ve emailed the competition organisers to ask whether they would be able to provide it but I am yet to hear back. Do you still have access to this data dictionary, or a list of what each of the acronyms are referring to?


    • Stijn Tilborghs Post authorReply

      Hi Ben

      Yea you’ll really need that data dictionary to make sense of the acronyms (although not all of them were explained and I had to do some googl’ing myself to make sense of those).
      The bad news is that – since I accepted the prize money – I’m bound to T&C’s so I can’t share that with you. 🙁
      Maybe try reaching out to some of the people who participated, but finished outside the top 10. The CAX website allows you to send direct messages.


  5. Ben Reply

    Hi Stijn,

    Thanks for the quick response.
    Okay I understand and I’ll do that.

    Thanks for the help!

  6. bashar hayani Reply

    Hi STIJN
    I’m studying master degree in Faculty of Informatics Engineering in the first year in Aleppo in Syria.

    My quarterly project is Predict fuel flow rate of airplanes
    I saw your dataset is very helpful in my project
    you said you added about 40 new features to dataset
    what is formulas to calculate this 40 new features?
    how can I calculate this 40 new features?

    • Stijn Tilborghs Post authorReply

      Hi Bashar
      Those calculations are at the core of what made this solution the winning one, and Honeywell spent over 10,000 USD dollars to learn that knowledge. So I hope you understand I can’t just share that with you :). I already dropped a few hints in the blog post though. Try calculating lags of some features, and also try different lag durations.

  7. Ishita Srivastava Reply

    Thank you for such an informative blog.May I know what accuracy you achieved using this approach ? Also, what ML techniques did you use for feature selection ?

  8. Ishita Srivastava Reply

    Hi Stijn
    Thanks a lot for your reply. May I know which correlation technique did you use, was is Pearson’s correlation technique?

  9. Ishita Srivastava Reply

    Hi Stijn,
    Thank you for your quick response. I had another question, would be glad if you could please answer it! The RMSE scores in the leaderboard are for which phase of the flight? I am assuming that you would have made separate models for each flight phase. Or is it a combined score? In that case, what technique did you use ?

    Another doubt which I had in mind was,( this is more of a ML doubt )- while combining data for each phase from different training data files, did you combine the data “time (in seconds) wise” and “phase wise”( like, 1st second data of phase 3 from all files , then 2nd second data of phase 3 from all files ) or did you just do it phase wise (like phase 3 data from file 1 then phase 3 data from file 2 and so on) ? As per my understanding, While analyzing the data the 1st approach might be useful but then while applying ML algo, it depends on the maths behind the algo. Would be grateful if you could throw some light on that part too.


    • Stijn Tilborghs Post authorReply

      Hi Ishita
      Yea so what I did was generate separate models and predictions for each flight phase. So you generate one file with predictions for each flight phase. After you have all that, then you need to merge all your predictions into 1 file. Just append the files to eachother (and sort it according to CAX’s preferred sorting). That was what was then submitted to CAX. So the leaderboard scores are calculated across all flight phases.

      When parsing the original CAX training files I just extracted phase 1 from file 1, then phase 1 from file 2, and do that for all files. Then do the same for other flight phases. You just want to generate one phase per file, cause that’s what you’re going to feed to a ML algorithm. I did not care about time data at all when doing that.

  10. Ishita Srivastava Reply

    Hi Stijn
    Thanks a lot for sharing you knowledge and experience, I really appreciate it.
    Could you please tell the RMSE scores you got for individual flight phases ?

    • Stijn Tilborghs Post authorReply

      Hi Ishita

      Final Submission to the leaderboard:
      PH0 237.79
      PH1 78.62
      PH2 192.24
      PH3 176.58
      PH4 87.48
      PH5 73.44
      PH6 130.10
      PH7 122.70
      Combined RMSE on local holdout: 115.84
      RMSE on public leaderboard: 114.73
      RMSE on private leaderboard: 110.91

      My best models:
      PH0 237.79
      PH1 78.44
      PH2 194.87
      PH3 176.58
      PH4 87.48
      PH5 64.50
      PH6 127.45
      PH7 106.81
      Combined RMSE on local holdout: 112.75
      RMSE on public leaderboard: ?
      RMSE on private leaderboard: ?

      This is all without any model tuning (I used the default XGboost parameters)

      Hope it helps you.

  11. Ishita Srivastava Reply

    Thanks a lot Stijn!That would be a lot of help.
    The initial steps which you have mentioned you took like visualization and finding correlation etc. to understand the data and find which feature works for any phase of flight; did you take these steps on the entire training data i.e the 1000 training data csv’s appended to each other or did you pick some csv’s randomly and analysed the data ?
    Also, how did you handle large csvs , I am using python pandas but that gives memory errors. Another option which I saw was using GraphLab Create. May I know which one you used?

    • Stijn Tilborghs Post authorReply

      Visualization was always per flight phase. So first append all 1000 csv’s and then just filter for the flight phase you’re interested in. This way you’ll be working on all data in the entire data-set for flight phase X.

      I used Pandas and I don’t remember having problems with the large CSV’s. My computer at that time had 16GB of RAM, but during model training I regularly saw memory usage of over 40 GB. So you need to make sure you have a large swap-file.

      Pandas can also throw memory errors when loading a CSV when it can’t figure out the correct data type. Hmmm that reminds me that I deleted one specific flight from the dataset. I think Pandas was choking on it. You may want to have a look at that.

  12. Ishita Srivastava Reply

    Thanks a lot Stijn, really grateful to you for all your help!

    I am having 4GB RAM, I tried to increase the swap space and was successful in reading the csv file but when I am using with Random Forest to see the feature importance scores for the climb phase, its just getting stuck with no response at all. I tried the same with XGBoost but there I get ‘DMatrix’ object has no attribute ‘handle’ error.
    So I was guessing that like you said there might be a flight data which is causing the error.
    As a solution to that, I applied pd.numeric() to all columns to the dataframe for climb phase.All columns after that showed either float64 / int64 as their datatype. But it still didn’t help, I was getting the same error as before.

    I tried checking for the flight data which might be causing the error by checking the dtypes pandas prints for all 1005 training data files, but there was no file for which pandas choked. May I know how you checked for the flight data which caused pandas to choke?

    • Stijn Tilborghs Post authorReply

      I quickly checked my code and couldn’t find the flight that I skipped when loading the data. Hmmm maybe that was what I did for the other airplane competition (the visualization one).

      Anyway I’m pretty sure this is not the solution to the problem you’re describing. It must be something else.


  13. Ishita Srivastava Reply

    Thanks a lot for your time and efforts, Stijn!

Leave me a comment. No registration required!

This site uses Akismet to reduce spam. Learn how your comment data is processed.