Forecasting hurricane movements - A machine learning approach
Felix Kroeber - Introduction to data science & machine learning (winter term 2021/22)

Objective & general approach


Given the revised Atlantic hurricane database (HURDAT2), the goal of the following analyses is to predict the next location of any hurricane 6 hours ahead from the most recent location available. The location should be specified as a pair of (Latitude, Longitude) coordinates. The forecast is based on a set of hurricane features partly depending on the recorded history of the hurricane under consideration. Specifically, this implies that a prediction is only feasible if there is at least a 24 h record of the hurricane as some essential features relate to the behaviour of the hurricane during this period. Three machine learning approaches (Support Vector Machine, Random Forest, Multi Layer Perceptron) are applied and evaluated against the performance of a simple linear regression as a baseline model.

Exploratory data analysis & data cleaning

In [2]:
# general setup
import geopandas as gpd
import matplotlib.pyplot as plt
import multiprocess
import numpy as np
import os
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import seaborn as sns
import warnings

from datetime import datetime, timedelta
from dtreeviz.trees import dtreeviz
from geographiclib.geodesic import Geodesic
from IPython.core.interactiveshell import InteractiveShell
from IPython.display import Image, HTML
from ipywidgets import widgets, Layout
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn import metrics
from sklearn import tree
from sklearn.svm import SVR
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import RandomizedSearchCV

%matplotlib inline
InteractiveShell.ast_node_interactivity = "all"
plt.rcParams['figure.figsize'] = [12, 7.5]
pd.set_option('display.max_rows', 50)
pd.set_option('display.max_columns', 50)
pd.set_option('display.precision', 3)
pd.options.mode.chained_assignment = None
The dataset consists of 49105 entries for 22 recorded variables. Each entry describes the state of an individual hurricane at a specific time by specifying the status, location, maximum wind speed, minimum pressure and 12 variables relating to the wind radii and thus the extent of the hurricane. The data set goes back to 1851, the most recent entries are from hurricanes in 2015. Missing values seem to exist mainly for older hurricanes for which accurate measurements of many variables were not yet possible.
In [7]:
# read hurricane data set
workdir = "C:\\Users\\felix\\OneDrive\\Studium_Master\\electives\\Intro_to_ML\\end_of_term"
hurricanes = pd.read_csv(os.path.join(workdir, "atlantic.csv"))
hurricanes.head()
hurricanes.tail()
Out[7]:
ID Name Date Time Event Status Latitude Longitude Maximum Wind Minimum Pressure Low Wind NE Low Wind SE Low Wind SW Low Wind NW Moderate Wind NE Moderate Wind SE Moderate Wind SW Moderate Wind NW High Wind NE High Wind SE High Wind SW High Wind NW
0 AL011851 UNNAMED 18510625 0 HU 28.0N 94.8W 80 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999
1 AL011851 UNNAMED 18510625 600 HU 28.0N 95.4W 80 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999
2 AL011851 UNNAMED 18510625 1200 HU 28.0N 96.0W 80 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999
3 AL011851 UNNAMED 18510625 1800 HU 28.1N 96.5W 80 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999
4 AL011851 UNNAMED 18510625 2100 L HU 28.2N 96.8W 80 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999 -999
Out[7]:
ID Name Date Time Event Status Latitude Longitude Maximum Wind Minimum Pressure Low Wind NE Low Wind SE Low Wind SW Low Wind NW Moderate Wind NE Moderate Wind SE Moderate Wind SW Moderate Wind NW High Wind NE High Wind SE High Wind SW High Wind NW
49100 AL122015 KATE 20151112 1200 EX 41.3N 50.4W 55 981 220 220 180 120 120 120 60 0 0 0 0 0
49101 AL122015 KATE 20151112 1800 EX 41.9N 49.9W 55 983 220 220 180 120 120 120 60 0 0 0 0 0
49102 AL122015 KATE 20151113 0 EX 41.5N 49.2W 50 985 540 520 200 220 120 120 60 0 0 0 0 0
49103 AL122015 KATE 20151113 600 EX 40.8N 47.5W 45 985 620 460 180 220 0 0 0 0 0 0 0 0
49104 AL122015 KATE 20151113 1200 EX 40.7N 45.4W 45 987 710 400 150 220 0 0 0 0 0 0 0 0
The time is specified by means of two variables, the first of which refers to the date and the second of which specifies the time. In order to facilitate working with the time information, a transformation of these variables, which are read in as integers, is necessary. The same applies to the variables latitude and longitude, which are encoded as strings.
In [8]:
hurricanes.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49105 entries, 0 to 49104
Data columns (total 22 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   ID                49105 non-null  object
 1   Name              49105 non-null  object
 2   Date              49105 non-null  int64 
 3   Time              49105 non-null  int64 
 4   Event             49105 non-null  object
 5   Status            49105 non-null  object
 6   Latitude          49105 non-null  object
 7   Longitude         49105 non-null  object
 8   Maximum Wind      49105 non-null  int64 
 9   Minimum Pressure  49105 non-null  int64 
 10  Low Wind NE       49105 non-null  int64 
 11  Low Wind SE       49105 non-null  int64 
 12  Low Wind SW       49105 non-null  int64 
 13  Low Wind NW       49105 non-null  int64 
 14  Moderate Wind NE  49105 non-null  int64 
 15  Moderate Wind SE  49105 non-null  int64 
 16  Moderate Wind SW  49105 non-null  int64 
 17  Moderate Wind NW  49105 non-null  int64 
 18  High Wind NE      49105 non-null  int64 
 19  High Wind SE      49105 non-null  int64 
 20  High Wind SW      49105 non-null  int64 
 21  High Wind NW      49105 non-null  int64 
dtypes: int64(16), object(6)
memory usage: 8.2+ MB
In an initial step of the data preparation, the coordinates are converted into a numerical form, whereby signs enable the distinction between east/west and north/south. The time information is transformed into the generic datetime format. The reverse transformation and decomposition of the composite datetime information into components such as year, month, etc. is done for two reasons: Firstly, it is necessary for the subsequent visualisation, as the plotting of the hurricanes using the plotly and mapbox libraries does not allow datetime objects. Secondly, it already represents a simple form of feature engineering, since these two variables are also included in the ML models, which will be discussed in more detail later. Other basic editing steps include encoding of missing values (-99, -999 and empty strings) as np.nan values and removing entries that may be listed twice.
In [9]:
# transform lat/long into numeric values
def coord_converter(coord_str):
    if any(x in coord_str for x in ["N", "E"]):
        return float(coord_str[:-1])
    if any(x in coord_str for x in ["S", "W"]):
        return float("-" + coord_str[:-1])

hurricanes["Longitude"] = hurricanes["Longitude"].apply(coord_converter)
hurricanes["Latitude"] = hurricanes["Latitude"].apply(coord_converter)

# transform date and time into datetime
hurricanes['Date'] = [str(date) for date in hurricanes['Date']]
hurricanes['Time'] = [str(date) for date in hurricanes['Time']]
hurricanes['Time'] = hurricanes['Time'].str.pad(width=4, fillchar='0')
hurricanes.insert(2, 
                  "Datetime", 
                  [datetime.strptime(dt[0]+dt[1], "%Y%m%d%H%M") for dt in zip(hurricanes['Date'], hurricanes['Time'])])
hurricanes.drop(columns=['Date', 'Time'], inplace=True)

# re-transform into int objects
hurricanes.insert(3, "Year", [x.year for x in hurricanes.Datetime])
hurricanes.insert(4, "Month", [x.month for x in hurricanes.Datetime])
hurricanes.insert(5, "Day", [x.day for x in hurricanes.Datetime])
hurricanes.insert(6, "Hour", [x.hour for x in hurricanes.Datetime])

# remove white spaces in front of names
hurricanes["Name"] = hurricanes["Name"].str.lstrip()

# encode NaNs
hurricanes.replace(["", -99, -999], np.nan, inplace=True)
hurricanes.replace(r'^\s*$', np.nan, regex=True, inplace=True)

# remove duplicates
hurricanes.drop_duplicates(inplace=True)
The data set did not contain any redundancies in the sense of duplicated entries, as there are still 49105 entries. As can be seen from a closer analysis of the values of the individual variables, the data describes a total of 1814 hurricanes. The following general statements regarding characteristics can be made for these hurricane records:
  • On average, there are about 29 records per hurricane. The longest track record of a single hurricane dates back to 1899 and contains 133 entries.

  • Nine different statuses are used to characterise the condition of a hurricane. The meaning of the abbreviations can be taken from the data documentation (the most frequent abbreviation TS, for example, describes the existence of a tropical cyclone of tropical storm intensity (34-63 knots)). Basically, the status as a categorical variable describes a combination of information given by the latitude and the wind intensity variable.

  • There are also nine different values for the event variable, which encodes events such as landfall (L). Note that most events contain information which can only be determined posthoc (e.g. intensity peak or minimum in central pressure). Since only special events are described, values for the event variable exist only in slightly fewer than 1000 cases.

  • The main part of the data set describes hurricanes in the 20th century. About half of all entries refer to the 100-year period 1850-1950, while the other half of the entries were recorded in the following 65 years.

  • All locations where hurricanes have been recorded are in the northern hemisphere, with a large proportion in the tropical-subtropical region. With regard to longitude, there are obviously some erroneous entries. The minimum degree entry does not exist and the maximum entry of 63 degrees is highly improbable due to its distance from the Atlantic Ocean.

  • While there are entries for the maximum wind speed for almost all data points, the minimum pressure in the hurricane system was only recorded for slightly more than 1/3 of the entries. Both variables are inconspicuous with regard to their values, the described minimum and maximum values appear plausible.

  • All variables describing the wind radii have values for only about 5900 entries. Including these variables as features in the following analyses would therefore mean to exclude a large part of all data.

In [10]:
# print basic summary stats for all vars
hurricanes.describe(include=object)
hurricanes.describe(datetime_is_numeric=True)
Out[10]:
ID Name Event Status
count 49105 49105 964 49105
unique 1814 288 9 9
top AL031899 UNNAMED L TS
freq 133 26563 903 17804
Out[10]:
Datetime Year Month Day Hour Latitude Longitude Maximum Wind Minimum Pressure Low Wind NE Low Wind SE Low Wind SW Low Wind NW Moderate Wind NE Moderate Wind SE Moderate Wind SW Moderate Wind NW High Wind NE High Wind SE High Wind SW High Wind NW
count 49105 49105.000 49105.000 49105.000 49105.000 49105.000 49079.000 48767.000 18436.000 5921.000 5921.000 5921.000 5921.000 5921.000 5921.000 5921.000 5921.000 5921.000 5921.000 5921.000 5921.000
mean 1950-05-26 13:08:39.046125696 1949.713 8.748 15.706 9.101 27.045 -65.665 53.052 992.244 81.865 76.518 48.647 59.156 24.642 23.030 15.427 18.403 8.110 7.358 5.131 6.269
min 1851-06-25 00:00:00 1851.000 1.000 1.000 0.000 7.200 -359.100 10.000 882.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
25% 1911-11-01 00:00:00 1911.000 8.000 8.000 6.000 19.100 -81.000 35.000 984.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
50% 1956-09-27 18:00:00 1956.000 9.000 16.000 12.000 26.400 -67.900 50.000 999.000 60.000 60.000 0.000 40.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000
75% 1989-08-10 06:00:00 1989.000 10.000 23.000 18.000 33.100 -52.500 70.000 1006.000 130.000 120.000 75.000 90.000 40.000 35.000 20.000 30.000 0.000 0.000 0.000 0.000
max 2015-11-13 12:00:00 2015.000 12.000 31.000 23.000 81.000 63.000 165.000 1024.000 710.000 600.000 640.000 530.000 360.000 300.000 330.000 360.000 180.000 250.000 150.000 180.000
std NaN 44.619 1.340 8.753 6.710 10.078 19.678 24.748 19.114 88.098 87.563 75.209 77.569 41.592 42.018 32.105 35.411 19.792 18.730 14.033 16.877

The findings so far suggest the following data processing steps to be sensible. All these steps effectively reduce the initial data set to data points for which values for the variables of interest are available.

  • Firstly, limiting the analyses to recent hurricanes with a certain minimum number of data points per hurricane. Basing the predictions on a minimum number of existing data points of a hurricane makes sense, since continuity and trends of movement-specific variables such as direction and speed of a hurricane can be assumed.
  • Secondly, setting the requirement that at least for maximum wind & minimum pressure data must be given in order to have a certain minimum amount of variables to be included in the models.
  • Thirdly, ignoring the wind radii and the event variable. The event variable should not be part of a forecast for logical reasons, as it requires knowledge of the further course of the hurricane. The exclusion of the wind radii seems reasonable due to the large proportion of NaN values for this variable, also because no immediate added value of these features for the given prediction problem of the next hurricane location is apparent. Further variables, which are not included in the models due to their uncertain contribution to the solution of the forecast problem for content-related and less technical reasons, will be discussed later (-> Feature Engineering). However, it can already be anticipated that e.g. the status variable will be excluded as a result of its described correlation with other variables.
In [11]:
# keep only most recent records due to larger inaccuracies in entries from early mid-century
hurricanes = hurricanes[hurricanes["Datetime"] > datetime.strptime("19500101", "%Y%d%m")]

# only keep entries for which at least 5 data points are available
long_track_hurricanes = hurricanes['ID'].value_counts()[hurricanes['ID'].value_counts() >= 5]
hurricanes = hurricanes[hurricanes['ID'].isin(long_track_hurricanes.index.tolist())]

# only retain entries for which pressure is available
hurricanes = hurricanes[~hurricanes['Minimum Pressure'].isna()]

# remove wind radii vars with lot of NaN values
hurricanes = hurricanes.loc[:,:"Minimum Pressure"]

# remove events as they are not useful
hurricanes.drop(columns="Event", inplace=True)

Since the prediction of the next hurricane location is basically a temporal-spatial problem, it seems to make sense to explicitly illuminate these two dimensions in the following. First, the time intervals between the individual records for a single hurricane are examined in more detail. This is relevant because the forecast should be based on the most recent records of movement-relevant variables. If one wants to predict the position of a hurricane based on its position changes between the last recording times, an error-free extrapolation of the position changes for the determination of future locations is only possible under equal time intervals. If there are no equal time intervals, there are various possibilities to make the time intervals comparable and to avoid errors in the models. Possible options would be, for example:

  • Exclusion of data points with unequal intervals, thus, reducing the overall usuable data amount
  • Temporal interpolations to produce equal intervals, although this already represented a substantial, assumption-based change to the raw data
  • Feature engineering of ultimately time-independent variables, e.g. speed and direction of hurricanes
  • Explicit inclusion of time intervals as variables in the models so that their influence and correlation with the other variables can be adequately represented
In [12]:
# Plotting time intervals between records
hurricanes_grouped = hurricanes.groupby("ID")
res = []

for name, group in hurricanes_grouped:
   delta_t = (group["Datetime"]-group["Datetime"].shift()).iloc[1:]
   res.append(delta_t)

res = [item for sublist in res for item in sublist]
res = [x/pd.Timedelta(minutes=60) for x in res]

plt.hist(res, bins=30, range=(0,30))
plt.ylabel('Frequency')
plt.xlabel('Timeinterval (h)');
As the frequency plot of the time intervals between two recordings shows, the data set has regular 6h intervals between the recording times in the vast majority of cases. These result from the recordings at the synoptic times 0, 6, 12, 18 o'clock. The few cases with smaller intervals correspond to the entries that record specific events. Therefore, in addition to the variable already excluded, the associated irregularly recorded values are now also excluded.
In [13]:
# remove all entries not corresponding to the synoptic times 00,06,12,18
syn_times_idx = [dt.strftime("%H%M") in ["0000", "0600", "1200", "1800"] for dt in hurricanes["Datetime"]]
hurricanes = hurricanes[syn_times_idx]
Longer time intervals are multiples of 6h and indicate individual time gaps in the recording of the variables. It can be assumed that these gaps exist especially for observations at the beginning of the period under consideration (from 1950 onwards), as e.g. instrument failures are more likely here. This hypothesis is examined in more detail below by plotting the proportion of time intervals greater than 6 hours in all time intervals for all hurricanes in a 5-year period.
In [14]:
# analyse temporal patterns in interval times
hurricanes_grouped = hurricanes.groupby("ID")
hurricanes["Time interval"] = hurricanes_grouped["Datetime"].transform(lambda x: x - x.shift())
_hurricanes = hurricanes.dropna(subset=["Time interval"])

# plotting time intervals between records
hurricanes_grouped = _hurricanes.groupby([x for x in 5 * np.floor(_hurricanes["Year"]/5)])
hurricanes_vis = pd.DataFrame({
    "n_data_points" : hurricanes_grouped["ID"].count(),
    "p_long_intervals" : hurricanes_grouped["Time interval"].agg(lambda x: sum(x != pd.Timedelta(hours=6))/len(x)) 
})

fig, ax1 = plt.subplots()
# plotting number of data points on left axis
ax1.bar(hurricanes_vis.index, "n_data_points", data = hurricanes_vis, width = 1.5)
ax1.set_ylabel("number of data points")
ax1.set_xticks(ticks = [*[x-2.5 for x in hurricanes_vis.index], hurricanes_vis.index[-1]+2.5],
               labels = [int(x) for x in [*hurricanes_vis.index,  hurricanes_vis.index[-1]+5]],
               rotation = 45)
ax1.set_ylim(0)
# plotting percentage of 6 hour intervals on right axis
ax2 = ax1.twinx()
ax2.plot(hurricanes_vis.index, hurricanes_vis.p_long_intervals, color="darkred")
ax2.set_ylabel("\n percentage of intervals > 6 hours")
ax2.set_ylim(0, 1);
The figure shows that almost all data gaps are limited to the period 1950 to 1980. Since then there have been continuous records at 6h intervals. Since the amount of data points before 1980 also shown is small, subsequent analyses can be limited to the period from 1980 onwards without further problems. After removing the few hurricane tracks with still existing irregularities in the time intervals in the period from 1980 onwards, there is thus a data set which exclusively shows equal time intervals.
In [15]:
# keep only records from 1980 onwards & remove later ones with gaps larger than 6 hours
hurricanes = hurricanes[hurricanes["Datetime"] > datetime.strptime("19800101", "%Y%d%m")]
rm_ids = hurricanes[[x > pd.Timedelta(hours=6) for x in hurricanes["Time interval"]]]["ID"]
hurricanes = hurricanes[~hurricanes["ID"].isin(rm_ids)]
print(f"{len(rm_ids)} hurricane tracks removed due to time gaps larger than 6 hours")
2 hurricane tracks removed due to time gaps larger than 6 hours
For the representation of the spatial component, interactive map representations are used in the following. This allows for fast, flexible and multivariate visualisations of patterns in the data. Note that the visualisation and interactivity of the widgets is lost in the rendered HTML format of this notebook. Some key findings from the spatial visualisation are therefore presented below as static figures and briefly described.
In [16]:
# interactive plotting of trajectories

warnings.filterwarnings("ignore")

# create user interface inputs
years = widgets.IntRangeSlider(
    value=[1900, 2100],
    min=min([x.year for x in hurricanes["Datetime"]]),
    max=max([x.year for x in hurricanes["Datetime"]]),
    step=1,
    description='Years (inclusive):',
    style = {'description_width': 'initial'},
    layout=Layout(width='50%')
)

months_dt = [datetime.strptime(str(i), "%m") for i in range(1,13)]
options = [(i.strftime('%b'), i) for i in months_dt]
months = widgets.SelectionRangeSlider(
    options=options,
    index=(0,11),
    description='Months (inclusive):',
    style = {'description_width': 'initial'},
    layout=Layout(width='40%')
)

hurricane_ids = list(hurricanes.ID.unique())

id_select = widgets.SelectMultiple(
    options=hurricane_ids,
    value=hurricane_ids,
    description='Id select:',
    rows=10,
    style = {'description_width': 'initial'}
)

popup_vars = widgets.SelectMultiple(
    options=list(hurricanes.select_dtypes(exclude=['datetime', 'timedelta']).columns),
    value=["Maximum Wind", "Minimum Pressure"],
    description='Popup variables:',
    rows=10,
    style = {'description_width': 'initial'}
)

numeric_vars = list(hurricanes.select_dtypes(include=['number']).columns)
numeric_vars.insert(0,None)
categorial_vars = list(hurricanes.select_dtypes(include=['object']).columns)

size_var = widgets.Dropdown(
    options=numeric_vars,
    description='Marker size:',
    style = {'description_width': 'initial'}
)

color_var_numerical = widgets.Dropdown(
    options=numeric_vars,
    description='Marker color 1st map:',
    value="Maximum Wind",
    style = {'description_width': 'initial'}
)

color_var_categorial = widgets.Dropdown(
    options=categorial_vars,
    description='Marker color 2nd map:',
    value="Name",
    style = {'description_width': 'initial'}
)

basemap = widgets.RadioButtons(
    options=['osm_base', 'osm_topographic'],
    value='osm_base'
)

out_info = widgets.Output()

# create function to define two interlinked maps with their initial settings
# two maps only differ in their vis settings w.r.t. marker color
# two maps need to be defined due to different layer stack composition (numeric vs. categorial vis var)
def hurrican_map(color_var):
    fig = go.FigureWidget(data = px.scatter_mapbox(
        data_frame = hurricanes,
        lat="Latitude",
        lon="Longitude",
        color=color_var,
        hover_data={"Latitude": False, 
                    "Longitude": False, 
                    "Maximum Wind": True,
                    "Minimum Pressure": True},
        width=975,
        height=700,
        center=go.layout.mapbox.Center(lat=47.5, lon=-60),
        zoom=1.75))
    fig = fig.update_layout(mapbox_style="open-street-map")
    return fig

map_I = hurrican_map("Maximum Wind")
map_II = hurrican_map("Name")

# define update (backend) procedures
class map_update:
    # initialisation procedure - set df & map to apply updates
    def __init__(self, map):
        self.df = hurricanes
        self.map = map

    def df_update(self):
        # get selection of years
        start_year = str(years.value[0]) + "0101"
        end_year = str(years.value[1]+1) + "0101"
        # get selection of months
        start_month = months.value[0].month
        end_month = months.value[1].month+1
        # apply temporal subsetting      
        self.df = self.df[self.df["Datetime"].between(datetime.strptime(start_year, "%Y%d%m"), 
                                                      datetime.strptime(end_year, "%Y%d%m"))]
        self.df = self.df[[x.month in range(start_month, end_month) for x in self.df["Datetime"]]]
        # apply id subsetting 
        self.df = self.df[[x in id_select.value for x in self.df["ID"]]]

    def map_vis_update(self, color_var):
        # clear previous outputs
        out_info.clear_output()
        # get columns to display in popups
        hover_vars = {}
        for col in list(self.df.select_dtypes(exclude=['datetime', 'timedelta']).columns):
            if col in popup_vars.value:
                hover_vars[col] = True
            else:
                hover_vars[col] = False
        # get size var
        if size_var.value != None:
            marker_size_var = size_var.value
        else:
            marker_size_var = None
        # configure fig based on selection
        if len(self.df) != 0:
            self.map_new = go.FigureWidget(data = px.scatter_mapbox(
                data_frame = self.df,
                lat="Latitude",
                lon="Longitude",
                size = marker_size_var,
                color = color_var.value,
                hover_data=hover_vars))
            self.map_new = self.map_new.update_layout(
                {'coloraxis': {'colorbar': {'title': {'text': color_var.value}}}}
                )
        else:
            with out_info:
                print("Selection obtained no data points. No map update was done. Reload of widget necessary!")
        # update basemap
        if basemap.value == "osm_base":
            self.map_new = self.map_new.update_layout(
            mapbox_style="open-street-map",
            mapbox_layers=[
                {
                    "below": 'traces',
                    "sourcetype": "raster",
                    "sourceattribution": "© OpenStreetMap contributors",
                    "source": [
                        "https://tile.openstreetmap.org/{z}/{x}/{y}.png"
                    ]
                }
            ])
        else:
            self.map_new = self.map_new.update_layout(
            mapbox_style="white-bg",
            mapbox_layers=[
                {
                    "below": 'traces',
                    "sourcetype": "raster",
                    "sourceattribution": "© OpenStreetMap contributors",
                    "source": [
                        "https://tile.opentopomap.org/{z}/{x}/{y}.png"
                    ]
                }
            ])
        # update old map using new properties
        self.map.update(data = self.map_new.data,
                        layout = {'coloraxis': self.map_new.layout.coloraxis,
                                  'mapbox': {'layers': self.map_new.layout.mapbox.layers,
                                             'style': self.map_new.layout.mapbox.style}})

def response_map_I(change):
    map_I_updater = map_update(map_I)
    map_I_updater.df_update()
    map_I_updater.map_vis_update(color_var_numerical)

def response_map_II(change):
    map_II_updater = map_update(map_II)
    map_II_updater.df_update()
    map_II_updater.map_vis_update(color_var_categorial)
                                    
# observe changes
for response in [response_map_I, response_map_II]:
    years.observe(response, names="value")
    months.observe(response, names="value")
    id_select.observe(response, names="value")
    popup_vars.observe(response, names="value")
    size_var.observe(response, names="value")
    basemap.observe(response, names="value")

color_var_numerical.observe(response_map_I, names="value")
color_var_categorial.observe(response_map_II, names="value")

# configure layout & display figure
accordion = widgets.Accordion([widgets.VBox([years, months]),
                               widgets.VBox([id_select]),
                               widgets.HBox([widgets.VBox([popup_vars], layout=Layout(padding='0px 25px 0px 0px')),
                                             widgets.VBox([size_var, color_var_numerical, color_var_categorial])]),
                               basemap])
accordion.set_title(0, 'Temporal Subsetting')
accordion.set_title(1, 'ID Subsetting')
accordion.set_title(2, 'Visualisation Settings')
accordion.set_title(3, 'Basemap Settings')

widgets.VBox([accordion, out_info, map_I, map_II])

# reset warnings to default
warnings.filterwarnings("default")
First of all, it should be noted that the spatial representation shows the elimination of some of the irregularities and errors in the data described at the beginning. For example, there are no longer any unusual locations, so the corresponding erroneous longitude data was obviously eliminated by filtering out older hurricanes.
Out[17]:
In general, the usual migration pattern of hurricanes in the Atlantic Ocean is clearly visible. Starting from a point of origin near the equator, most hurricanes initially move northwest following uniform wind conditions over a large area. At the height of the northern tropic, they then turn around and move mostly towards the northeast, where they gradually weaken due to a lack of further energy from increasingly colder air masses (NOAA 2021a). A seasonal representation of hurricane tracks shows that this simplified pattern applies to the majority of hurricanes in summer to late summer. The few hurricanes outside the classic hurricane period, on the other hand, usually show more complex and less uniform movement patterns.
Out[3]:
Out[3]:
The weakening of a hurricane occurs very intensively in the case of a landfall. The lack of latent energy through maritime air masses then becomes clearly noticeable. Accordingly, there is a clear intensity gradient (in terms of wind speed) for hurricanes at the land-water boundary. More important with regard to the question of the next position of a hurricane, however, is the concentration of hurricanes in the coastal area, evident in the following map in the Mexican Gulf and off the east coast of Florida. Relatively few hurricanes reach the mainland, which may be due to a change in the direction of migration or slowing down and dissipation during or shortly before a potential landfall. While the connection between hurricane intensity and continentality is clearly evident, the dependency between directional changes and proximity to the coast can only be assumed here for the time being, but gives rise to further investigations in the context of feature engineering.
Out[19]:
The use of a topographic basemap finally makes it possible to analyse the extent to which hurricane movements are controlled by the relief on land. Enhanced topography seems to contribute to the general weakening of hurricanes so that relatively few hurricanes reach these areas. However, for these hurricanes that cross barriers such as the Rocky Mountains, Appalachians or the mountainous areas of the Dominican Republic, no clear relief-induced movement changes can be observed. It is not evident that the distance travelled within 6h is significantly reduced in such mountainous areas, nor that there is a clear reversal of direction just before reaching the mountains. Accordingly, the topography is not further included as a relevant feature.
Out[20]: