trying to understand a 4d array in python read in with osgeo gdal-CodePudding

I'd like to plot an array with the following dimensions: [3x3x180x360]

it's a 180x360 world map with 3 different opacity layers and 3 different pressure levels. Thus, I would like to plot the map with data of 1 opacity layer at one pressure level.

How can I access the array and plot the data using plt.imshow() for example?

Visualizing the data with panoply looks the following way:

import numpy as np
from osgeo import gdal

granule = "CER_CldTypHist_GEO-MODIS_Edition4A_407408.202109.hdf"
hdf_file = gdal.Open(workdir_data   "/"   granule)
subDatasets = hdf_file.GetSubDatasets()
cld_amount_liq_md = gdal.Open(subDatasets[68][0]).ReadAsArray() # takes ~5min to read ...

# Shape of cld_amount_liq_md after ReadAsArray() is [64800, 3, 3] hence:
cld_amount_liq_md  = cld_amount_liq_md.reshape(180,360, 3, 3)

#filtering bad data:
cld_amount_liq_md[cld_amount_liq_md > 3.40E38] = np.nan

# Plotting -> how can I plot the data?
plt.imshow(cld_amount_liq_md[?,?,?] ,cmap ="jet")```

Update: Here's additional info due to inputs from the comments:

How do I get the dimensions from the HDF file?

from osgeo import gdal

granule = "CER_CldTypHist_GEO-MODIS_Edition4A_407408.202109.hdf"
hdf_file = gdal.Open(workdir_data   "/"   granule)
subDatasets = hdf_file.GetSubDatasets()

j=0
for i in subDatasets:
    print(j, i[1]) # i[0] contains the path to subdataset, i[1] the info as seen in the screenshot
    j =1

Link to the HDF4 File Here's the link to the HDF4 File: CER_CldTypHist_GEO-MODIS_Edition4A_407408.202109.hdf

The full variable name of the subdataset is: Monthly_Day_Averages/Cloud_Properties_for_9_Cloud_Types_Monthly_Day/cld_amount_liq_md

CodePudding user response：

I found the answer:

As expected, .ReadAsArray() does not work for multidimensional arrays.

The HDF4 file needs to be opened and processed the following way:

hdf_file = gdal.OpenEx(workdir_data   "/"   granule, gdal.OF_MULTIDIM_RASTER)
rootGroup = ds.GetRootGroup()
op = rootGroup.OpenGroup('scientific_datasets')
ds3 = op.OpenMDArray('cld_amount_liq_md')
data = ds3.ReadAsArray(buffer_datatype = gdal.ExtendedDataType.Create(gdal.GDT_Float64))

or in a shorter version:

ds = gdal.OpenEx(workdir_data   "/"   granule, gdal.OF_MULTIDIM_RASTER)
rootGroup = ds.GetRootGroup()
op = rootGroup.OpenGroup(rootGroup.GetGroupNames()[0]).OpenMDArray('cld_amount_liq_md').ReadAsArray(buffer_datatype = gdal.ExtendedDataType.Create(gdal.GDT_Float64))

After that the 4D-Array can be accessed and plotted the usual way:

#op's shape is: [3, 3, 180, 360]
plt.imshow(op[1,1,:,:],cmap ="jet")
plt.show()

CodePudding user response：

so a simple index should work, indexing on the first opacity and first presure level:

plt.imshow(cld_amount_liq_md[1, 1, :, :] ,cmap ="jet")

Please check what the shape is of the index, so maybe something like adding .ravel() can make sense to remove the extra dimensions.

plt.imshow(cld_amount_liq_md[1, 1, :, :].ravel() ,cmap ="jet")