Predict fuel consumption of airplanes: Winning model approach

A few weeks ago I participated in my first ever machine learning competition. It was a contest initiated by Honeywell, hosted on CrowdAnalytix. A data driven approach was used to predict fuel consumption in aircrafts during different phases of flight (taxi, take-off, climb, cruise, descent, roll-out) based on flight data recorder (FDR) archives. I’m happy to say I took 1st place! Below is an overview of my approach.

The objective was two-fold:

  • Predict fuel flow for every second during different flight phases. The more fuel in an airplane, the heavier it is. A heavier airplane will have higher fuel consumption. So I’m assuming the predictive model can then be used to predict fuel consumption during future flights, which can help reducing the amount of fuel entered in the fuel tanks before flight.
  • Use the model to extract actionable insights and provide these in a report.

I first split the training data in several parts, one for each Flight Phase. I ran through below steps for each of the parts:

  • First I removed unnecessary and noisy features that could negatively impact the model’s accuracy. There were a lot of features that didn’t have any, or almost no variance, so these were easily removed. Then there was also data in the training set where fire was detected. In those cases I removed the feature AND the observations (if it wasn’t too much data).
  • I did quite a lot of manual feature engineering and I have no doubt this is the key to this model’s strength. In total I added about 40 new features (rough guess). Most new features were lags. Here’s just one example: I noticed that there was barely any correlation between throttle target and fuel consumption. That doesn’t make any sense, does it? The throttle of the engines SHOULD impact fuel flow. So I noticed that when I took an X seconds lag then suddenly the correlation was quite obvious! This makes sense. If the pilot changes the throttle target then it will take some time for the engines to adapt to it.
  • I used XGBoost to train the model. I simply used the standard settings for training, because unfortunately I ran out of time to tune the hyperparameters. No doubt some tuning could make it predict even better. Time was really an issue for me: I wasn’t able to train my latest and best model on all data (notably my final submission for the cruising phase was based on an older model).

Personally I was really missing an ‘actual engine throttle’ feature. I think this extra feature could make a big contribution to an even better predictive model. However, perhaps predicting would be way too easy with this feature, so I’m thinking maybe Honeywell decided to remove this parameter from their dataset in order to let participants actively search for other, less visible relationships.

Cheers,
Stijn

13 gedachten over “Predict fuel consumption of airplanes: Winning model approach

  1. Luc Antheunissens Beantwoorden

    Congratulations Stijn. Interesting the way you managed to fill some crucial gaps by pragmatic and creative thinking.
    I’m convinced predictive data analysis will go through an astonishing (r)evolution the years to come
    Wish you all success in this endeavour !
    Message to all organisations potentially interested in this future data science; don’ hesitate to contact this young entrepreneur ! He sure will bring you to higher grounds with your data insights.

  2. Rahul Vishwalarma Beantwoorden

    Hello Sir,
    My name is Rahul Vishwakarma. I am post graduation student in India. Sir currently I am working on same project. But I used GPR model to predict the fuel flow rate. sir i am faced problem in understanding the dataset. Please suggest me .
    thank you

    • Stijn Tilborghs Bericht auteurBeantwoorden

      Hi Rahul. You’ll have to be a little more specific than that. What exactly is it that you don’t understand about the dataset?

      I agree the dataset is quite a tough one to crack. You’ll have to do a few hours of googl’ing the names of different parameters and teach yourself the various pilot instruments and how height and speed can be measured on an airplane (there are different methods and several methods are in the dataset).

  3. Rahul Vishwalarma Beantwoorden

    Thank you so much sir for your valuable suggestion.
    sir suggest me some article and paper which are helpful in complete my project. I am fresher in machine learning, that why facing more problem. If you have any manual for FDR dataset please send me.

  4. Ben Beantwoorden

    Hi Stijn, I’m a undergraduate student trying to complete a project to design a predictive controller which can control an aircraft when taxiing as well as optimising fuel consumption during this phase of flight.

    I came across the Honeywell competition you won when searching for FDR data to derive my aircraft taxiing model. I am having some trouble understanding what each of the acronyms refer to in the data. Within the description on the CrowdAnalytix webpage it mentions a ‘data dictionary’, which I was hoping may describe each of these acronyms but I can’t find it on the CrowdAnalytix website. I’ve emailed the competition organisers to ask whether they would be able to provide it but I am yet to hear back. Do you still have access to this data dictionary, or a list of what each of the acronyms are referring to?

    Thanks,
    Ben

    • Stijn Tilborghs Bericht auteurBeantwoorden

      Hi Ben

      Yea you’ll really need that data dictionary to make sense of the acronyms (although not all of them were explained and I had to do some googl’ing myself to make sense of those).
      The bad news is that – since I accepted the prize money – I’m bound to T&C’s so I can’t share that with you. 🙁
      Maybe try reaching out to some of the people who participated, but finished outside the top 10. The CAX website allows you to send direct messages.

      Stijn

  5. Ben Beantwoorden

    Hi Stijn,

    Thanks for the quick response.
    Okay I understand and I’ll do that.

    Thanks for the help!

  6. bashar hayani Beantwoorden

    Hi STIJN
    I’m studying master degree in Faculty of Informatics Engineering in the first year in Aleppo in Syria.

    My quarterly project is Predict fuel flow rate of airplanes
    I saw your dataset is very helpful in my project
    you said you added about 40 new features to dataset
    what is formulas to calculate this 40 new features?
    how can I calculate this 40 new features?

Leave me a comment. No registration required!