Masters programme | E-portfolio
Semester I & II
Spatial Analyses

Solar radiation analyses

Performing a multitemporal classification of Sentinel imagery to extract vineyards. Subsequently, analysing the significance of irradiation intensity and duration for the location of vineyards. Using random forest as an embedded feature selection method.

Table of Contents

Introduction

Throughout Germany, about 100,000 ha and thus about 0.28% of the land is used for winegrowing. Well-known wine-growing areas are mainly located in the climatically favoured valleys of the Moselle and the Upper Rhine. The extent to which solar radiation is a particularly important factor at regional level with regard to the location of the vineyards will be investigated in the following using a 21 x 35.5 km2 study area in southern Baden-Württemberg (comprising areas from Breisgau, Kaiserstuhl & Markgräflerland). The area is particulary suitable as different forms of cultivation with vineyards next to arable land or grassland are present. This enables to compare the relevance of solar radiation for different landuses. More specifically, solar radiation (in terms of duration and intensity) will be used in an attempt to seperate fields of different landuses from each other. The research design is as follows:

  • Distinguising and delimiting vineyards from other fields using a Random Forest classifier on multi-temporal cloud-free Sentinel 2 imagery
  • Computing field-wise aggregated statistics for solar intensity and sunshine duration but also other potentially relevant variables such as soil type and number of days with frost
  • Performing a second random forest classification using the aggregated statistics on the field objects in order to assess the importance of single features
Area of interest (AoI) with the locations of training samples used for the landuse classfication

Step I: Landuse classification

In order to be able to make statements about the significance of solar radiation for agricultural land of different types, a land use classification must first be available. Due to the very limited open data policy of the state of Baden-Württemberg, no corresponding data sets are available. At the national level, the Corine Landcover CLC5 could be used. However, with its minimum mapping unit of 5 ha, it is too coarse to capture local differences in land use. Accordingly, an ad-hoc land use classification is created below. The advantage of multi-temporal classification approaches in the given case is that it can be assumed that the seasonal spectral signature of vineyard areas can be distinguished from other areas. In contrast, when using a monotemporal classification, the differentiation from other agricultural areas may be difficult depending on the time of observation. In an object-based approach, object-features such as the texture within plots could be used as additional features for improved classification. To put knowledge on the texture into practice, however, one needs to perform elaborated class modelling or, alternatively, the availability of countless training samples to be used in a convolutional neural network approach. Nevertheless, an object-based approach seems generally appropriate, since statements are to be made about fields or strokes and not about arbitrary pixel units. In addition, the salt-and-pepper effect of pixel-based classifications can be avoided in this way. In order to combine these strengths of the object-based approach with those of the multitemporal pixel-based version while keeping the overall effort limited, the following two-step procedure was implemented:

First, based on high-resolution satellite images, fields or impacts were segmented. Since there are again no freely available orthophotos of the land, the high-resolution images provided in ArcGIS as Basemap by Maxar were used as the basis for segmentation. The basemap was exported as an rgb image with a resolution of about 1m for the AoI and then segmented using mean-shift. In order to achieve the best possible result, various parameter settings of the algorithm were tested and the results were manually examined in a smaller section of the study area. The best parameter combination was then used to calculate an Aoi-wide segmentation. The result for the whole area as well as for the smaller test area is shown below.

Subsequently, cloud-free Sentinel 2 data were used for a multi-temporal classification. For this purpose, the monthly Level 3 composites were obtained from DLR and a single composite was created from all the monthly RGB images. Using the classification wizard in ArcGISPro, a total of over 150 training areas were created for a total of 7 different classes, which were then used in combination with the composite to train an RF classifier. When creating the 7 classes, the focus was placed on agricultural and forestry areas according to the objective of the analyses, whereby 4 classes (forest, arable land, pastures and vineyards) were differentiated. For the sake of completeness, a distinction was made in all other land use classes between artificial surfaces, water and others, although this is not important in the following, as these forms of use are not considered to be in competition with viticulture. The training accuracy of the RF classifier was 98.6%, although a significantly lower test accuracy can be assumed. An explicit validation of the classification results did not take place, since within the framework of the current proof of concept, the confirmation of the quality of the classification by means of a brief visual assessment was sufficient. Nevertheless, there are obvious errors in the classification (see below), so that the classification definitely represents an important starting point for improving the quality of the results in the course of a more in-depth examination of the workflow presented here.

Step II: Field statistics on influencing factors

Once the dependent variable, i.e. the land use class of each plot, has been determined, the set of explanatory variables must be formed. With regard to the effects of the solar variables, it seems plausible that above all the radiation intensity, but possibly also the radiation duration, have an influence on viticulture and its locations. In addition to the annual sum of these variables, the seasonal sum over the two months of September and October is also considered, as this represents the ripening phase of the vines with particular importance of solar radiation. It can be assumed that there is a high correlation between the radiation intensity or duration in the year compared to the high season, since, for example, slightly inclined south-facing slopes are favoured in both cases over steeply inclined north-facing slopes due to the diurnal path of the sun. Nevertheless, the seasonal totals are considered as independent variables, since the lower sun (compared to the annual mean) benefits slopes with a higher inclination more, which may make them particularly suitable for viticulture. In order to obtain values for the selected set of four solar variables for all pixels and then by averaging for all fields, atmospheric conditions must be taken into account in addition to time of day and year, latitude and topography. These are included in the alogrithm used (Area solar radiation ArcGISPro) via the specification of the atmospheric transmissivity and the proportion of diffuse radiation. For a precise parameterisation, annual radiation values for the past 5 years were obtained from the DWD in 1x1km grid resolution and averaged. These data contained the annual sums of diffuse and direct radiation on the horizontal plane in kwH/m2, so that a reliable estimate of the proportion of diffuse radiation could be determined from the 5-year average of these data. The transmissivity was derived by dividing the sum of direct and diffuse radiation on the ground by the annual sum of radiation arriving at the edge of the atmosphere. For the latter, an approximate value of 300 W/m was assumed for the latitude of the study area (Hartmann 2016) and multiplied accordingly by 24 * 365 days. The resulting spatially variable transmissivity and diffuse radiation fraction values are shown below. To apply the radiation algorithm, the study area was then tiled according to the grid cells of the atmospheric variables and a separate calculation of the four variables of interest was performed for each of the resulting 768 cells. The final resolution is determined by underlying digital elevation model that was resampled to 10m. The resulting data sets are shown further down in an image carousel together with the other included variables. It should be noted that the annual sum of radiation duration represents values under best-weather conditions, i.e. without cloud cover. It may also be noticeable that the annual sum of radiation intensity reaches a maximum of 1025 kwH and, with the spatial mean of about 775 kwH, is about 1/3 below the usual data for the study area. This is probably due to seasonal variations in cloud cover with on average less cloud cover and thus stronger radiation during summer (-> DWD). The effects of the lack of inclusion of the temporal variability of the atmospheric parameters under consideration are also evident here. However, it should be emphasised that the temporal variability is less important in the context of the present question than an approximate recording of the spatial variability. The fact that the inclusion of the spatial variability of the atmospheric parameters is important is reflected in the resulting solar data, some of which are clearly oriented towards the grid cell boundaries (see large-scale consideration of the cut-out area for solar duration below).

Time-averaged atmospheric transmissivity
Time-averaged proportion of diffuse radiation

The following additional variables are included in the analysis and tested for their significance against the solar variables:

  • The frost frequency, i.e. the number of days with frost, is taken into account, as frost damage is a limiting location factor for viticulture. A corresponding data set with reference to the 30-year climate reference period is provided by the DWD. It is based on an altitude-specific regression model and an IDW interpolation applied to existing station measurement data, whereby local influences (e.g. cold air runoff) are not mapped. The resolution is 1×1 km.
  • Wind speed is included, as exposure to wind generally promotes erosion and could also act as a limiting factor. Again, a data set from the DWD is used, which includes the multi-annual mean wind speed at 10m above the ground at 200 x 200m resolution.
  • The digital terrain model with the heights above sea level, which has already been used in the solar calculations, is considered separately again, as correlations with the last two factors are to be expected, but the significantly higher spatial resolution may be noticeable.
  • The slope derived from the terrain model is also considered. Here, correlations with the solar variables are to be expected. Nevertheless, the independent consideration of the slope is justified under the aspect of erosivity and thus the exclusion of other types of use.
 
Pixelwise values for all variables as shown below were finally averaged per field in order to derive an intermediate data set used as the input for the next step.

Step III: Assessing solar radiation significance

Random Forest classifiers perform intrinsic feature selection. Therefore, training a random forest classifier on the given set of variables of interest can be used as an embedded feature selection and feature importance evaluation strategy. For the specific problem at hand, all (semi-)natural surfaces that do not belong to the vineyard class have been grouped as „others“. The total size of the sample set prior to dividing it into training and testing samples amounted to 98.300 plots. Subsequently, a classfier with a maximum number of 100 trees and a maximum depth of 50 was trained to predict the landuse for each plot as being either vineyard or others. This resulted in a model with a testing accuracy of 88.6%. Keeping in mind the numerical ratio between other land uses and vineyards is 82:18, the classification accuracy itself is less impressive at first. Nevertheless, it is higher than the accuracy achieved by random guessing with knowledge of the ratio value, which first demonstrates that the selected features have some predictive power. If in a second step the importance of the individual features is considered in terms of their contribution to information gain (operationalised in the applied model as gini impurity), then the following order of importance of the features results. Contrary to the initial assumption, the solar variables seem to be the least important ones. Instead, the elevation followed by the wind speed and number of frost days are the most significant ones in terms of discriminating between vineyards and other landuses. This stays true even if one repeats the analyses without combining all the non-vineyard classes (i.e. forest, pastures and arable land).

Feature ranked by their importance for performing the classification vineyard vs. others

However, since this does not imply that the solar variables are completely meaningless, they were again considered exclusively without including the other explanatory variables. If the analyses are repeated accordingly for the distinction between vineyards vs. others, the classification accuracy drops to 82.5%. This practically corresponds to the level achieved by random numbers and thus completely meaningless predictors, which suggests the counterintuitive conclusion that the solar variables actually have no predictive power at all. The reason for this result, however, is the imbalance between the feature classes, which masks the importance of the solar variables. If we visualise a simplified decision tree, we can see that the solar variables influence the frequency of vineyards relative to the other land uses in the corresponding subgroups of all fields. As expected, the proportion of vineyards increases with increasing solar radiation intensity and duration. The only problem with regard to classification accuracy is that there is obviously an approximate 50:50 ratio between vineyards and other land uses even in the subgroup with the highest irradiation durations and intensities due to the numerical preponderance. If one repeats the analyses by first drawing a random sample from the fields of other uses, which corresponds in size to the number of vineyards, then a classification accuracy of around 67% is shown, which is clearly superior to chance.

Decision tree using solar variables only (depth limited to 3 levels)

If the feature importance under the solar variables is displayed, the significance of the variables summed over the year is somewhat higher than that of the seasonal variables, in line with the presentation of the decision tree. Overall, however, the difference between the importance of the variables is very small (range between 0.26 for rad_dur_annual_sum and 0.23 for rad_global_highseason_sum). This points to the high autocorrelation and thus redundancy of the features, which can be explicitly expressed by a corresponding correlation heatmap.

Correlation heatmap of features

Conclusion & Outlook

The analyses carried out can be summarised as follows: The general importance of solar variables (irradiation duration and intensity) for viticulture can be confirmed empirically by comparison with areas of other uses. Quantifying the exact significance of the solar factors turned out to be challenging. It can be said that solar variables are not necessarily more important than other influencing factors (e.g. wind speed and frost frequency). At least, areas with high values in the area of irradiation duration and intensity are not exclusively used for viticulture. This seems plausible and it is obvious that the set of explanatory variables included is incomplete. Other factors such as soil type and rooting depth as well as latent variables of the historical development of land and land use rights would have to be included in order to be able to clearly separate the locations of vineyards relative to other uses. Considering this shortcoming as well as the limited spatial resolution of some of the data sets used in the analyses, the results obtained appear quite plausible and confirm the feasibility of the analyses in the sense of a proof of concept. In the case of a renewed and more in-depth analysis, a change to a study area in another federal state with a stronger open-data policy is recommended. In NRW, for example, not only would broader data sets on soil factors be available, but also a corresponding recording of land use at parcel level. This way, the step of manually creating a classification, which is fraught with inaccuracies, could be omitted.

Appendix - Scripts

step I - landuse classification

				
					# Setup/Imports
import arcpy
import itertools
import numpy as np
import pandas as pd
import os
import requests
import wget
from bs4 import BeautifulSoup
from pyproj import Transformer
from tqdm import tqdm

working_dir = "C://Users/felix/OneDrive/Studium_Master/semester_I_Spatial_Analyses/assignment_solar"
home = arcpy.mp.ArcGISProject("current").homeFolder
gdb = arcpy.mp.ArcGISProject("current").defaultGeodatabase


# A. Segmentation
# try different segmentation techniques on subset of high-res imagery data
# restraining parallel processing due to software bug
spectral_detail = np.arange(15, 21, 1)
spatial_detail = np.arange(10, 21, 1)

for param_com in itertools.product(spectral_detail, spatial_detail):
    with arcpy.EnvManager(parallelProcessingFactor="0"):
        out_raster_dataset = arcpy.ia.SegmentMeanShift(
            "high_res_rgb_subset",
            int(param_com[0]),
            int(param_com[1]),
            1000,
            "",
            100000,
        )
        out_raster_dataset.save(
            os.path.join(gdb, f"pre_segments_spec{param_com[0]}_spat{param_com[1]}")
        )

# apply segmentation with best-suited parameters
with arcpy.EnvManager(parallelProcessingFactor="0"):
    out_raster_dataset = arcpy.ia.SegmentMeanShift(
        os.path.join(home, "high_res_rgb.tif"), 17, 12, 1000, "", 100000
    )
    out_raster_dataset.save(os.path.join(gdb, "high_res_segmented"))

# extract segmented fields as polygons
out_raster = arcpy.sa.RegionGroup(
    "high_res_segmented.tif", "EIGHT", "WITHIN", "NO_LINK", None
)
out_raster.save(os.path.join(gdb, "high_res_segmented_grouped"))

arcpy.conversion.RasterToPolygon(
    os.path.join(gdb, "high_res_segmented_grouped"),
    os.path.join(gdb, "segmented_aoi"),
    "NO_SIMPLIFY",
    "OBJECTID",
    "SINGLE_OUTER_PART",
    None,
)

arcpy.management.Dissolve(
    os.path.join(gdb, "segmented_aoi"),
    os.path.join(gdb, "segmented_aoi_grouped"),
    "gridcode",
    None,
    "MULTI_PART",
    "DISSOLVE_LINES",
)

selected_segments = arcpy.management.SelectLayerByLocation(
    os.path.join(gdb, "segmented_aoi_grouped"),
    "COMPLETELY_WITHIN",
    "aoi_vineyards",
    None,
    "NEW_SELECTION",
    "NOT_INVERT",
)
selected_segments = arcpy.management.SelectLayerByAttribute(
    selected_segments, "SUBSET_SELECTION", "Shape_Area > 1000", None
)
arcpy.CopyFeatures_management(selected_segments, "segmented_aoi")


# B. Classification 
# download Sentinel2 data
# parse html page from dlr containing paths to Sentinel2 Level3 imagery 2021
baseurl = "https://download.geoservice.dlr.de/S2_L3A_WASP/files/32/U/MU/2021/"
page_s2_level3 = requests.get(baseurl)
page_s2_level3 = BeautifulSoup(page_s2_level3.content, "html.parser")

# download r,g,b,nir images for each month (highest res of 10m)
download_dir = os.path.join(working_dir, "input_data")
for month in tqdm(
    page_s2_level3.find_all("a", href=lambda x: x.startswith("SENTINEL2X"))
):
    subfolder = os.path.join(baseurl, month.get("href"))
    page_month = requests.get(subfolder)
    page_month = BeautifulSoup(page_month.content, "html.parser")
    for band in page_month.find_all("a", href=lambda x: x.startswith("SENTINEL2X")):
        if any(
            x in ["B2.tif", "B3.tif", "B4.tif", "B8.tif"]
            for x in (band.get("href").split("_"))
        ):
            if not os.path.isfile(os.path.join(download_dir, band.get("href"))):
                wget.download(f'{subfolder}{band.get("href")}', download_dir)


# import bands, merge them into one composite band & clip to aoi extent
file_list = [
    f
    for f in os.listdir(os.path.join(os.path.dirname(home), "input_data"))
    if f.rsplit(".", 1)[1] == "tif"
]

arcpy.management.CompositeBands(
    ";".join(file_list), os.path.join(gdb, "sentinel_bands")
)

arcpy.management.Clip(
    "sentinel_bands",
    "395145.4594 5306156.9779 420883.0407 5342253.1793",
    os.path.join(gdb, "sentinel_bands_aoi"),
    "aoi_vineyards",
    "32767",
    "ClippingGeometry",
    "NO_MAINTAIN_EXTENT",
)

# performing classification incl. manual selection of training objects via classification wizard
# extracting relevant classes from final classification raster
out_raster = arcpy.sa.ExtractByAttributes(
    "aoi_classified.tif",
    "Class_name = 'arable_land' Or Class_name = 'forest' Or Class_name = 'pastures' Or Class_name = 'vineyards'",
)
out_raster.save(os.path.join(gdb, "aoi_classified"))
				
			

step II - field statistics on influencing factors

				
					# Setup/Imports
import arcpy
import itertools
import numpy as np
import pandas as pd
import os
import requests
import wget
from bs4 import BeautifulSoup
from pyproj import Transformer
from tqdm import tqdm

working_dir = "C://Users/felix/OneDrive/Studium_Master/semester_I_Spatial_Analyses/assignment_solar"
home = arcpy.mp.ArcGISProject("current").homeFolder
gdb = arcpy.mp.ArcGISProject("current").defaultGeodatabase

# A. DWD Data Acquisition & Data Import into ArcGIS
# data on days with frost
download_dir = os.path.join(working_dir, "input_data_II")
file_dir = "https://opendata.dwd.de/climate_environment/CDC/grids_germany/multi_annual/frost_days/grids_germany_multi_annual_frost_days_1991-2020_17.asc.gz"
wget.download(file_dir, download_dir)
file_name = file_dir.rsplit("/", 1)[1]
with gzip.open(os.path.join(download_dir, file_name), "rb") as f_in:
    with open(os.path.join(download_dir, file_name.rsplit(".", 1)[0]), "wb") as f_out:
        shutil.copyfileobj(f_in, f_out)

# adding dwd frost data
with arcpy.EnvManager(scratchWorkspace=gdb, workspace=gdb):
    arcpy.conversion.RasterToGeodatabase(
        os.path.join(
            os.path.dirname(home),
            "input_data_II",
            "grids_germany_multi_annual_frost_days_1991-2020_17.asc",
        ),
        gdb,
        "",
    )

arcpy.management.DefineProjection(
    os.path.join(gdb, "grids_germany_multi_annual_frost_days_1991_2020_17"),
    'PROJCS["DHDN_3_Degree_Gauss_Zone_3",GEOGCS["GCS_Deutsches_Hauptdreiecksnetz",DATUM["D_Deutsches_Hauptdreiecksnetz",SPHEROID["Bessel_1841",6377397.155,299.1528128]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Gauss_Kruger"],PARAMETER["False_Easting",3500000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",9.0],PARAMETER["Scale_Factor",1.0],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]',
)

with arcpy.EnvManager(extent=arcpy.sa.Raster("aoi_classified").extent):
    arcpy.management.ProjectRaster(
        os.path.join(gdb, "grids_germany_multi_annual_frost_days_1991_2020_17"),
        os.path.join(gdb, "annual_frost_days"),
        'PROJCS["WGS_1984_UTM_Zone_32N",GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",9.0],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]',
        "BILINEAR",
        "200 200",
        "DHDN_To_WGS_1984_4_NTv2",
        None,
        'PROJCS["DHDN_3_Degree_Gauss_Zone_3",GEOGCS["GCS_Deutsches_Hauptdreiecksnetz",DATUM["D_Deutsches_Hauptdreiecksnetz",SPHEROID["Bessel_1841",6377397.155,299.1528128]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Gauss_Kruger"],PARAMETER["False_Easting",3500000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",9.0],PARAMETER["Scale_Factor",1.0],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]',
        "NO_VERTICAL",
    )

# data on wind
file_dir = "https://opendata.dwd.de/climate_environment/CDC/grids_germany/multi_annual/wind_parameters/resol_200x200/wind_wdat_geo_10m_BRD_200m.asc.zip"
wget.download(file_dir, download_dir)
file_name = file_dir.rsplit("/", 1)[1]
with zipfile.ZipFile(os.path.join(download_dir, file_name), "r") as zip_ref:
    zip_ref.extractall(download_dir)

# adding dwd wind data
with arcpy.EnvManager(scratchWorkspace=gdb, workspace=gdb):
    arcpy.conversion.RasterToGeodatabase(
        os.path.join(
            os.path.dirname(home),
            "input_data_II",
            "wind_wdat_geo_10m_BRD_200m.asc",
            "wind_wdat_geo_10m_BRD_200m.asc",
        ),
        gdb,
        "",
    )

with arcpy.EnvManager(extent=arcpy.sa.Raster("aoi_classified").extent):
    arcpy.management.ProjectRaster(
        os.path.join(gdb, "wind_wdat_geo_10m_BRD_200m"),
        os.path.join(gdb, "wind_speed_mean"),
        'PROJCS["WGS_1984_UTM_Zone_32N",GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",9.0],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]',
        "BILINEAR",
        "200 200",
        "DHDN_To_WGS_1984_4_NTv2",
        None,
        'PROJCS["Deutsches_Hauptdreiecksnetz_Transverse_Mercator",GEOGCS["GCS_Deutsches_Hauptdreiecksnetz",DATUM["D_Deutsches_Hauptdreiecksnetz",SPHEROID["Bessel_1841",6377397.155,299.1528128]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",3500000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",9.0],PARAMETER["Scale_Factor",1.0],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]',
        "NO_VERTICAL",
    )

# data on atmospheric conditions
download_dir = os.path.join(working_dir, "input_data_III")
baseurls = [
    "https://opendata.dwd.de/climate_environment/CDC/grids_germany/annual/radiation_diffuse/",
    "https://opendata.dwd.de/climate_environment/CDC/grids_germany/annual/radiation_direct/",
]

for baseurl in baseurls:
    page_rad = requests.get(baseurl)
    page_rad = BeautifulSoup(page_rad.content, "html.parser")
    for year in page_rad.find_all(
        "a", href=lambda x: x.startswith("grids_germany_annual_radiation_")
    ):
        wget.download(os.path.join(baseurl, year.get("href")), download_dir)
        with zipfile.ZipFile(
            os.path.join(download_dir, year.get("href")), "r"
        ) as zip_ref:
            zip_ref.extractall(download_dir)
        os.remove(os.path.join(download_dir, year.get("href")))
        file_name = os.path.join(download_dir, year.get("href").replace(".zip", ".asc"))
        with open(file_name, "r") as fin:
            cell_data = fin.readlines()[22:]
        with open(file_name, "w") as fout:
            fout.writelines(cell_data)

# adding dwd atmospheric data
file_list = [
    f for f in os.listdir(os.path.join(os.path.dirname(home), "input_data_III"))
]

for raster in file_list:
    arcpy.management.DefineProjection(
        os.path.join(os.path.dirname(home), "input_data_III", raster),
        'PROJCS["DHDN_3_Degree_Gauss_Zone_3",GEOGCS["GCS_Deutsches_Hauptdreiecksnetz",DATUM["D_Deutsches_Hauptdreiecksnetz",SPHEROID["Bessel_1841",6377397.155,299.1528128]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Gauss_Kruger"],PARAMETER["False_Easting",3500000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",9.0],PARAMETER["Scale_Factor",1.0],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]',
    )


# B. Preparing atmospheric data as input for solar variable calculations
# temporal averaging
for rad_type in ["diffuse", "direct"]:
    layer_list = [
        f.name
        for f in arcpy.mp.ArcGISProject("current")
        .listMaps()[0]
        .listLayers(f"*{rad_type}*")
    ]
    with arcpy.EnvManager(extent=arcpy.sa.Raster("aoi_classified").extent):
        out_raster = arcpy.ia.CellStatistics(
            ";".join(layer_list),
            "MEAN",
            "NODATA",
            "SINGLE_BAND",
            90,
            "AUTO_DETECT",
        )
        out_raster.save(os.path.join(gdb, f"rad_{rad_type}"))

# transforming direct & diffuse radiation into prop diffuse & transmissivity
output_raster = arcpy.ia.RasterCalculator(
    ["rad_diffuse", "rad_direct"], ["dif", "dir"], "dif/(dir+dif)"
)
output_raster.save(os.path.join(gdb, "prop_diffuse"))
output_raster = arcpy.ia.RasterCalculator(
    ["rad_diffuse", "rad_direct"], ["dif", "dir"], "(dir+dif)/300*24*365/1000"
)
output_raster.save(os.path.join(gdb, "transmissivity"))

# creating tiles ("atmos_segments") with constant prop diffuse & transmissivity
arcpy.management.CreateFishnet(
    os.path.join(gdb, "atmos_polygons"),
    "399959.517416895 5306149.67495471",
    "399959.517416895 5306159.67495471",
    None,
    None,
    arcpy.management.GetRasterProperties("transmissivity", "ROWCOUNT"),
    arcpy.management.GetRasterProperties("transmissivity", "COLUMNCOUNT"),
    "420951.267561637 5342135.38875472",
    "NO_LABELS",
    "399959.517416895 5306149.67495471 420951.267561637 5342135.38875472",
    "POLYGON",
)

arcpy.management.AddGeometryAttributes("atmos_polygons", "EXTENT", "", "", None)

arcpy.ia.ZonalStatisticsAsTable(
    "atmos_polygons",
    "OID",
    "transmissivity",
    os.path.join(gdb, "zonal_atmos_transmissivity"),
    "DATA",
    "MEAN",
    "CURRENT_SLICE",
    90,
    "AUTO_DETECT",
)

arcpy.management.JoinField(
    "atmos_polygons", "OID", "zonal_atmos_transmissivity", "OID", "MEAN"
)

arcpy.management.AlterField(
    "atmos_polygons",
    "MEAN",
    "transmissivity",
    "transmissivity",
    "DOUBLE",
    8,
    "NULLABLE",
    "DO_NOT_CLEAR",
)

arcpy.ia.ZonalStatisticsAsTable(
    "atmos_polygons",
    "OID",
    "prop_diffuse",
    os.path.join(gdb, "zonal_atmos_propdiffuse"),
    "DATA",
    "MEAN",
    "CURRENT_SLICE",
    90,
    "AUTO_DETECT",
)

arcpy.management.JoinField(
    "atmos_polygons", "OID", "zonal_atmos_propdiffuse", "OID", "MEAN"
)

arcpy.management.AlterField(
    "atmos_polygons",
    "MEAN",
    "prop_diffuse",
    "prop_diffuse",
    "DOUBLE",
    8,
    "NULLABLE",
    "DO_NOT_CLEAR",
)

# C. Solar variable calculations
# function for spatially variant multi-threaded solar radiation calculation
def calc_solar_radiation(atmos_segment, count, time_config, gdb):
    # set parameters
    gdb = os.path.join(arcpy.mp.ArcGISProject("current").homeFolder, gdb)
    inRaster = "dem.tif"
    extent = atmos_segment[0:4]
    transformer = Transformer.from_crs("epsg:32632", "epsg:4326")
    min_x = transformer.transform(extent[0], extent[1])[0]
    max_x = transformer.transform(extent[2], extent[3])[0]
    latitude = (min_x + max_x) / 2
    skySize = 400
    timeConfig = time_config
    dayInterval = 7
    hourInterval = 0.5
    zFactor = 1
    calcDirections = 32
    zenithDivisions = 16
    azimuthDivisions = 16
    diffuseProp = atmos_segment[-1]
    transmittivity = atmos_segment[-2] / 100
    outDirectRad = ""
    outDiffuseRad = ""
    outDirectDur = os.path.join(gdb, f"dur_interprod_{count}")
    # execute AreaSolarRadiation
    extent_buffer = list(map(lambda x: x[0] + x[1], zip(extent, [-20, -20, 20, 20])))
    with arcpy.EnvManager(extent=" ".join([str(x) for x in extent_buffer])):
        outGlobalRad = arcpy.sa.AreaSolarRadiation(
            inRaster,
            latitude,
            skySize,
            timeConfig,
            dayInterval,
            hourInterval,
            "INTERVAL",
            zFactor,
            "FROM_DEM",
            calcDirections,
            zenithDivisions,
            azimuthDivisions,
            "STANDARD_OVERCAST_SKY",
            diffuseProp,
            transmittivity,
            outDirectRad,
            outDiffuseRad,
            outDirectDur,
        )
    arcpy.management.Clip(
        outGlobalRad,
        " ".join([str(x) for x in extent]),
        os.path.join(gdb, f"global_rad_{count}"),
        None,
        "3.4e+38",
        "NONE",
        "NO_MAINTAIN_EXTENT",
    )
    arcpy.management.Clip(
        os.path.join(gdb, f"dur_interprod_{count}"),
        " ".join([str(x) for x in extent]),
        os.path.join(gdb, f"dur_rad_{count}"),
        None,
        "3.4e+38",
        "NONE",
        "NO_MAINTAIN_EXTENT",
    )


# create gdbs for storing tiles results
arcpy.management.CreateFileGDB(home, "rad_whole_year", "CURRENT")
arcpy.management.CreateFileGDB(home, "rad_high_season", "CURRENT")

# perfoming calculations for all tiles
fields = [field.name for field in arcpy.ListFields("atmos_polygons")][4:]
atmos_segments = [list(row) for row in arcpy.da.SearchCursor("atmos_polygons", fields)]
for i, seg in enumerate(atmos_segments):
    calc_solar_radiation(seg, i, "WholeYear 2020", "rad_whole_year.gdb")
    calc_solar_radiation(seg, i, "MultiDays 2020 244 304", "rad_high_season.gdb")

# merging tiles to one raster
def merge_tiles(gdb, rad_type_wildcard, n_bands, out_name):
    arcpy.env.workspace = os.path.join(home, gdb)
    rasters = arcpy.ListRasters(rad_type_wildcard)
    rasters = [os.path.join(arcpy.env.workspace, rast) for rast in rasters]

    arcpy.management.MosaicToNewRaster(
        ";".join(rasters),
        home,
        out_name,
        'PROJCS["WGS_1984_UTM_Zone_32N",GEOGCS["GCS_WGS_1984",DATUM["D_WGS_1984",SPHEROID["WGS_1984",6378137.0,298.257223563]],PRIMEM["Greenwich",0.0],UNIT["Degree",0.0174532925199433]],PROJECTION["Transverse_Mercator"],PARAMETER["False_Easting",500000.0],PARAMETER["False_Northing",0.0],PARAMETER["Central_Meridian",9.0],PARAMETER["Scale_Factor",0.9996],PARAMETER["Latitude_Of_Origin",0.0],UNIT["Meter",1.0]]',
        "32_BIT_FLOAT",
        10,
        n_bands,
        "MEAN",
        "FIRST",
    )


merge_tiles("rad_high_season.gdb", "dur_rad*", 9, "rad_dur_highseason.tif")
merge_tiles("rad_high_season.gdb", "global_rad*", 9, "rad_global_highseason.tif")
merge_tiles("rad_whole_year.gdb", "dur_rad*", 12, "rad_dur_annual.tif")
merge_tiles("rad_whole_year.gdb", "global_rad*", 12, "rad_global_annual.tif")

# summarising bands
def sum_bands(raster):
    n_bands = arcpy.management.GetRasterProperties(raster, "BANDCOUNT", "")
    bands = [f"B{i}" for i in range(1, int(str(n_bands)) + 1)]
    out_raster = arcpy.sa.BandArithmetic(raster, "+".join(bands), 0)
    out_raster.save(os.path.join(gdb, f'{raster.split(".tif")[0]}_sum'))


rad_rasters = [
    "rad_dur_highseason.tif",
    "rad_global_highseason.tif",
    "rad_dur_annual.tif",
    "rad_global_annual.tif",
]
list(map(sum_bands, rad_rasters))


# D. Calculating stats per plot/field
def zonal_stats_landuse_segments(raster):
    with arcpy.EnvManager(cellSize="aoi_classified"):
        arcpy.ia.ZonalStatisticsAsTable(
            "segmented_aoi",
            "OBJECTID",
            raster,
            os.path.join(gdb, f'zonal_{raster.split(".tif")[0]}'),
            "DATA",
            "MEAN",
            "CURRENT_SLICE",
            90,
            "AUTO_DETECT",
        )
    arcpy.management.JoinField(
        "segmented_aoi",
        "OBJECTID",
        f'zonal_{raster.split(".tif")[0]}',
        "OBJECTID_1",
        "MEAN",
    )
    arcpy.management.AlterField(
        "segmented_aoi",
        "MEAN",
        raster.split(".tif")[0],
        raster.split(".tif")[0],
        "DOUBLE",
        8,
        "NULLABLE",
        "DO_NOT_CLEAR",
    )


# apply function to classification & explanatory rasters
classification_raster = "aoi_classified"
zonal_stats_landuse_segments(classification_raster)

explanatory_rasters = [
    "dem.tif",
    "slope.tif",
    "annual_frost_days",
    "wind_speed_mean",
    "rad_dur_highseason_sum.tif",
    "rad_global_highseason_sum.tif",
    "rad_dur_annual_sum.tif",
    "rad_global_annual_sum.tif",
]

list(map(zonal_stats_landuse_segments, explanatory_rasters))

# export attribute table for further analyses
vars = [field.name for field in arcpy.ListFields("segmented_aoi")]
data = [list(row) for row in arcpy.da.SearchCursor("segmented_aoi", vars)]
df = pd.DataFrame(data, columns=vars)
df = df.set_index(arcpy.Describe("segmented_aoi").OIDFieldName, drop=True)
df.to_csv(os.path.join(home, "fields.csv"))
				
			

step III - assessing solar radiance significance

				
					# Setup/Imports
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics
from sklearn import tree
from dtreeviz.trees import dtreeviz
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns
import os

home = "C://Users/felix/OneDrive/Studium_Master/semester_I_Spatial_Analyses/assignment_solar"
fields = pd.read_csv(os.path.join(home, "solar_analyses", "fields.csv"))

# Data cleaning & feature engineering
fields = fields.dropna()
fields = fields[
    np.isclose(fields["aoi_classified"], np.round(fields["aoi_classified"]), atol=0.01)
]
fields["aoi_classified"] = np.round(fields["aoi_classified"])
fields = fields[[field in [2, 4, 5, 6] for field in fields["aoi_classified"]]]
fields = fields.astype({"aoi_classified": "category"})

# Combine all non-vineyards to one class
fields["aoi_classified"] = fields["aoi_classified"].replace([2, 5, 6], 1)

# uncomment the following lines for balanced analyses with 50:50 subset
# fields["random_idx"] = np.random.randn(len(fields), 1)
# fields = fields.sort_values(["aoi_classified", "random_idx"], ascending=False)
# fields = fields.head(2*sum(fields["aoi_classified"]==4))

# Defining labels & features
y = fields["aoi_classified"]

X = fields[
    [
        "dem",
        "slope",
        "annual_frost_days",
        "wind_speed_mean",
        "rad_dur_highseason_sum",
        "rad_global_highseason_sum",
        "rad_dur_annual_sum",
        "rad_global_annual_sum",
    ]
]

# uncomment the following lines for repeated analyses with solar variables only
# X = fields[
#     [
#         "rad_dur_highseason_sum",
#         "rad_global_highseason_sum",
#         "rad_dur_annual_sum",
#         "rad_global_annual_sum",
#     ]
# ]

# Split dataset into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)

# Create a Gaussian Classifier
clf = RandomForestClassifier(n_estimators=100, max_depth=3)

# Train the model using the training set
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)

# Determine model accuracy & feature importance
print("Accuracy:", metrics.accuracy_score(y_test, y_pred))
feature_imp = pd.Series(clf.feature_importances_, index=X.columns).sort_values(
    ascending=False
)
feature_imp

# Plot feature importance
sns.barplot(x=feature_imp, y=feature_imp.index, color="darkblue")
plt.xlabel("Feature Importance Score")
plt.ylabel("Features")
plt.savefig(os.path.join(home, "feature_importance.png"), bbox_inches="tight")
plt.close()

# Vis decision trees, only for solar variables
X_subset = fields[
    [
        "rad_dur_highseason_sum",
        "rad_global_highseason_sum",
        "rad_dur_annual_sum",
        "rad_global_annual_sum",
    ]
]

X_train, X_test, y_train, y_test = train_test_split(X_subset, y, test_size=0.3)
classifier = tree.DecisionTreeClassifier(max_depth=5)
classifier.fit(X_train, y_train)

viz = dtreeviz(
    classifier,
    x_data=X_subset,
    y_data=y,
    target_name="landuse",
    feature_names=list(X_subset.columns),
    class_names=["others", "vineyard"],
    colors={"classes": [None, None, ["#DDDDDD", "#CC9900"]]},
)

viz.view()
viz.save(os.path.join(home, "tree.svg"))

# Correlation map to show correlations between explanatory variables
heatmap = sns.heatmap(
    fields.iloc[:, 5:].corr(), cmap="RdBu_r", vmin=-1, vmax=1, annot=True, fmt=".2f"
)
plt.savefig(os.path.join(home, "correlation_map.png"), bbox_inches="tight")
plt.close()
				
			

Ideas for a separate post specifically on the landuse classification approach:

    • multiresolution-segmentation to derive at better segmentation results
    • generate 300 random points, based on that create training & testing data
    • include object-features in classification (such as geometry, size, texture)
    • modified/reduced set of feature classes: vineyards, other agricultural plots, others
     

Think about how to combine object-based approach & multitemporal assessment