Masters programme | E-portfolio
Semester independent

Geopython 2022

Impressions from the conference on python in the field of spatial analysis. Interesting individual contributions are briefly summarised. They are grouped into broader categories and where links to related themes and methods came to mind corresponding material is added.

Table of Contents

Introduction and contextualisation

Geopython is an annual conference focusing on the Python programming language for spatial applications. Relative to its size in terms of attendance, it covers a wide range of contributions from more software engineering related to strongly application focused, remote sensing oriented talks. Overall, this provides a holistic view of the implementation of innovative research-related or application-oriented projects as geoprocessing workflows in the python environment.

In my following reflection & summary of some contributions, I limit myself to a selection of contributions that I myself have grouped under two headings. The first group mainly includes contributions that focus on improvements and developments of various python libaries for easier and more performant handling of geodata. I try to turn this into a concise picture on the existing python spatial ecosystem and its evolution. In the second group there are three contributions where I would like to focus more on the underlying statistical-methodological component. For this purpose, background information on the presented applications and/or functionalities of the software packages is presented.

The grouping into the above-mentioned topics is subjective and does not correspond to the official structure of the conference. It is also not a comprehensive overview of the contributions at this year’s Geopython. If the presentation and attempted classification of individual contributions is distorting and/or misleading, please get in touch. Any correcting comments are appreciated.

Finally, I would like to take this opportunity to thank the contributors and the organising team – I enjoyed participating very much and I am already looking forward to the event next year!

I. Geospatial software ecosystem

Related individual contributions


II. Statistical methods

Related individual contributions

  • Schmitz, S.: UMAP for dimensionality reduction of high-dimensional remote sensing data
  • Gonzalez, C. A.: Empirical downscaling and super-resolution of earth science data
  • Moliński, S.: Area-to-Area and Area-to-Point kriging for finer resolution of areal aggregates

Uniform Manifold Approximation and Projection (UMAP) is a novel non-linear dimensionality reduction technique. In the field of dimensionality reduction techniques, UMAP belongs to the class of k-neighbour based graph learning algorithms. This group of dimensionality reduction techniques contrasts with matrix factorisation techniques such as PCA, which perform a global linear mapping of high-dimensional data to low dimensional representations. Graph-based algorithms work as follows: First, a weighted graph of the high-dimensional data is constructed and then a translation of this graph into low dimensional space preserving its desirable characteristics is performed. The low dimensional representation is adjusted iteratively according to the optimising function quantifying the similarity of the low-dimensional to the high-dimensional graph in terms of the defined criteria.

Principle of graph-based manifold mapping
(Sainburg et al. 2020)

UMAP shares this working principle with a range of other graph-based dimensionality reduction approaches such as the t-SNE and Laplacian eigenmaps. However, UMAP seems to be superior to t-SNE in terms of capturing the global and topological structure of a given dataset. Furthermore, the algorithm scales better with larger datasets and shows better run time performance. For in-depth explanations on UMAP and comparisons to PCA, taking a closer look at the original publication (McInnes et al. 2018) is highly recommended. A more graphically oriented explanation of the working mechanism of UMAP can be found here. Also, the following website provides interactive resources for gaining an intuitive understanding.

In the field of remote sensing, the multi-/hyperspectral and/or multitemporal character of data justifies the general interest in dimensionality reduction techniques. At geopython, dimensionality reduction techniques were presented in the context of a study on exploratory analysis of multi-frequency, full-polarimetric, interferometric SAR data. UMAP was compared to other approaches in terms of their capability to separate different land cover classes in low dimensional representations. It was concluded that UMAP exceeds the capability of PCA and Laplacian Eigenmaps in these regards and is competitive with t-SNE. An excerpt from the results of the study presented is given below.

Low dimensionality representations for linear and non-linear mappings
(Schmitz et al. 2021)

The ability to perform downscaling of imagery with a minimum of perceptual and distortional loss represents a topical issue in the remote sensing field. Its need mainly stems from high costs attached to high-resolution imagery. Being able to create truthful super-resolved images based on freely available low-resolution images may enhance the results across a range of subsequent applications such as land cover classifications. From an earth scientist’s point of view, downscaling is also important as higher resolution of input data may increase the computational costs for the calculation of models by orders of magnitude – a phenomenon that is well known in numerical weather and climate prediction. A coarse-resolution calculation with subsequent reliable super-resolution of the product is desirable. In recent times, a variety of deep learning based methods to perform this task of downscaling (a.k.a super-resolution) has been proposed – for an overview, see Wang et al. 2022, Bashir et al. 2021, Yang et al. 2019 and Anwar et al. 2019.

*** t.b.d.: How does super-resolution work from a statistical point of view? -> see overviews ***

At geopython, a package implementing state-of-the-art deep learning algorithms for empirical downscaling of gridded Earth science data called “DL4DS” was introduced. Its architecture builds on top of tensorflow and keras and it supports distributed GPU training. A variety of convolutional backbone models such as residual and dense ones are implemented to enable spatial as well as spatio-temporal downscaling with the help of a high-resolution data reference set and auxiliary variables.

Downscaling/Super-resolution via DL4DS
(Gonzalez 2022)

Spatial interpolation is a standard processing step within geographical workflows. One established interpolation technique from the field of geostatistics is kriging. Unknown values are estimated as a linear combination of neighbours. Relying on this weighted sum, kriging is somehow similar to deterministic interpolation techniques such as Inverse Distance Weighting (IDW). Different to IDW, however, kriging considers not only the distance to the prediction location for assigning weights to the surrounding measured points. Instead, the spatial arrangement of the measured points is taken into account as well.

Kriging is a multi-stage process starting with the quantification of spatial autocorrelation. A related key concept is the semi-variance, which is a measure of dissimilarity between data points as a function of their distance. Kriging builds upon semi-variances by calculating the experimental variogram across all points in a first step. This variogram enables in a second step to fit a model based on which the weights for the neighbours in the actual interpolation process can be derived. For details on how this process is working by solving a system of linear equations, see Smith et al. 2021 or Goovaerts 2019.

There are different types of kriging depending on the surface model that is used to constrain the calculation of kriging weights. In the most general form kriging assumes a surface model composed of two different components (see Smith et al. 2021):

  • a structural component $\mu \left ( s \right )$ representing a deterministic overall trend
  • a regionalised statistical component $\varepsilon \left ( s \right )$ driven by spatial autocorrelation and including random noise variations

$$Z\left(s\right) = \mu \left ( s \right )+\varepsilon \left ( s \right )$$

$$with \ s \ describing \ the \ spatial \ position \ \left ( x,y\right )$$

The regionalised statistical component is considered equally across all kriging approaches by modelling the semi-variance and using the fitted function as input for the interpolation process as described above. However, with regard to the structural component, different assumptions are made for simple, ordinary and universal kriging.

  • Simple kriging assumes a fixed $\mu$, i.e. the data has a known, constant, mean value throughout the study area independent of s. This amounts to the assumption of strict stationarity.
  • Ordinary kriging assumes locally constant $\mu$, i.e. the spatial field may have globally varying mean values that are fixed within given neighbourhoods under consideration
  • Universal kriging assumes a polynomial trend in the structural component $\mu$ using a regression model (e.g. linear or quadratic) as part of the kriging process

Further differentiations between types of kriging can be made in terms of the spatial supports for performing interpolation. The most common setting is to predict the values at a specific point given known point measurements in its neighbourhood. In contrast, the approach of block kriging aims at the prediction of areal instead of punctual values. Equivalent to the average of kriging estimates at sub-areal points, this leads to smoothing of the results with less uncertainty. Similar to modifying the destination geography, kriging variants also exist for variations in the source geography, i.e. the geometry of the input data. Area-to-point kriging, for example, refers to the prediction of point values from areal observations, equivalent to downscaling or disaggregation. Generalisations of kriging to account for multiple secondary data in the spatial estimation process further increase the variety of existing approaches. Accounting for secondary variables (e.g. high-resolution auxiliary variables) to improve the interpolation of the actual variable of interest is referred to as co-kriging.

Considering the range of kriging approaches, a bunch of implementations differing in their provided functionalities and models exist for performing kriging in a python-based environment. The one that was presented at Geopython is called pyinterpolate. It comprises basic kriging techniques as well as more sophisticated area-to-area and area-to-aoint poisson kriging methods. Some of the capabilities of the package were presented by Simon Molinsky the developer of pyinterpolate during an extended hands-on workshop. Compared to PyKrige, another spatial interpolation package known for its point kriging capabilities, pyinterpolate excels in its support for areal kriging.

Area to point kriging of breast cancer rates using finer population grids as support
(adapted from Moliński 2022)