Executive Summary
Hurricanes pose a seasonal threat to millions of peoples so that forecasting hurricanes is an important subdiscipline in the field of natural disaster prevention. Records for Atlantic hurricanes date back to 1851. Since then the continously growing HURDAT database has been established and builds the basis for a variety of statistical models trying to predict hurricane properties such as their maximum wind speed or the point in time on which landfall will occur. One of the major questions of interest is the prediction of the next position of a cyclones eye. This prediction task is tackled in the presented analyses by utilising a set of machine learning (ml) models to forecast the next location of any hurricane 6 hours ahead from the most recent location available. Based on an exploratory analysis of the raw data, a total of 482 hurricanes recorded between 1980 and 2015 is filtered and used as the input for the subsequent feature engineering step. This step mainly focuses on the segmentation of a single hurricanes tracks into subtracks each containing the last 4 records available. As recordings are made at synoptic times every 6h, these subtracks represent the 24h short-time history of a hurricane. This history is used to calculate travelled distances and bearings, which together with other features are used as inputs for the suite of ml models. Random forest regressors (rf), support vector machines (svm) and multi layer perceptrons (mlp) are trained and evaluated against a linear regression as a baseline model. Hyperparameter tuning is performed in the case of rf, svm and mlp to obtain most accurate results. All four models are finally averaged to derive at an ensemble model. As a main finding it turns out, that the given forecasting task can already be solved to a large degree by utilising the most simple linear regression model including only one explanatory variable. Using enlarged feature sets and more complex ml models improves the accuracy of the forecast by only a few percent. The most accurate result with an average track forecasting error of about 47.7 km could be achieved using the ensemble model.
Interactive Map Widget
To carry out the exploratory data analysis parts, an interactive map widget was designed using ipywidgets & the plotly library. Apart from the specific objective of analysing the data in terms of its properties and suitability for the forecasting task, this generalistic widget can also be used to investigate spatio-temporal patterns in the data set in general. Below is a demonstration of some of the visualisation possibilities using the widget, focusing on Hurricane Katrina in 2005.
Full analysis
All analyses were carried out as a jupyter notebook, the rendered version of which is presented below. If you want to play with the source code itself, you may access the corresponding repository by clicking on the button below.