I don't have a lot of python experience and I'm trying something rather complicated for me, so excuse my messy code. I have a few arrays that were generated with rasterio
from raster layers (tif), and ultimately I want to get some basic statistics from each raster layer and append it to a data frame.
I'm trying to get it as automated as possible since I have a lot of layer to go through. another obstacle was getting the column name to change according to each raster.
I managed to work almost everything out, the problem is when I insert it into a for loop, instead of stats values, I get this: <built-in method values of dict object at 0x00..
would appreciate help solving that.
import rasterio
from osgeo import gdal
import numpy as np
import pandas as pd
#open all files **I have a lot of folders like that one to open
#Grifin data read
Gr_1A_hh_path = r"E:\SAOCOM\1A1B\Images\Griffin\130122\Source\Data\gtc-acqId0000705076-a-sm9-2201150146-hh-h.tif"
Gr_1A_hh = rasterio.open(Gr_1A_hh_path)
Gr_1A_vv_path = r"E:\SAOCOM\1A1B\Images\Griffin\130122\Source\Data\gtc-acqId0000705076-a-sm9-2201150146-vv-h.tif"
Gr_1A_vv = rasterio.open(Gr_1A_vv_path)
Gr_1A_vh_path = r"E:\SAOCOM\1A1B\Images\Griffin\130122\Source\Data\gtc-acqId0000705076-a-sm9-2201150146-vh-h.tif"
Gr_1A_vh = rasterio.open(Gr_1A_vh_path)
Gr_1A_hv_path = r"E:\SAOCOM\1A1B\Images\Griffin\130122\Source\Data\gtc-acqId0000705076-a-sm9-2201150146-hv-h.tif"
Gr_1A_hv = rasterio.open(Gr_1A_hv_path)
#reading all the rasters as arrays
array_1A_hh= Gr_1A_hh.read()
array_1A_vv= Gr_1A_vv.read()
array_1A_vh= Gr_1A_vh.read()
array_1A_hv= Gr_1A_hv.read()
#creating a dictionary so that each array would have a name that would be used as column name
A2 = {
"HH":array_1A_hh,
"VV":array_1A_vv,
"VH":array_1A_vh,
"HV":array_1A_hv}
df= pd.DataFrame(index=["min","max","mean","medien"])
for name, pol in A2.items():
for band in pol:
stats = {
"min":band.min(),
"max":band.max(),
"mean":band.mean(),
"median":np.median(band)}
df[f"{name}"]=stats.values
OUTPUT:
df
HH ... HV
min <built-in method values of dict object at 0x00... ... <built-in method values of dict object at 0x00...
max <built-in method values of dict object at 0x00... ... <built-in method values of dict object at 0x00...
mean <built-in method values of dict object at 0x00... ... <built-in method values of dict object at 0x00...
medien <built-in method values of dict object at 0x00... ... <built-in method values of dict object at 0x00...
CodePudding user response:
Considering you have a dict of images:
import numpy as np
import pandas as pd
vmin, vmax = 0, 255
C, H, W = 2, 64, 64
images_names = ["HH", "VV", "VH", "HV"]
images = {
im_name: np.random.randint(vmin, vmax, size=(C, H, W))
for im_name in images_names
}
And a bunch of functions to compute stats on a per band basis:
stats_functions = {
"min": lambda band: band.min(),
"max": lambda band: band.max(),
"mean": lambda band: band.mean(),
"median": lambda band: np.median(band),
}
You can first construct a dict
of statistics:
images_stats = {
im_name: {
band_idx: {
stat_name: stat_func(band)
for stat_name, stat_func in stats_functions.items()
}
for band_idx, band in enumerate(im)
}
for im_name, im in images.items()
}
And then convert it to a pandas DataFrame:
images_stats_df = pd.concat(
{
im_name: pd.DataFrame(im_stats)
for im_name, im_stats in images_stats.items()
},
axis="columns",
)
Which gives:
>>> images_stats_df
HH VV VH HV
0 1 0 1 0 1 0 1
min 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000
max 254.000000 254.000000 254.000000 254.000000 254.000000 254.00000 254.000000 254.000000
mean 127.070557 126.082764 126.483643 127.737061 127.270996 128.89502 128.814209 124.610352
median 129.000000 127.000000 126.000000 127.000000 127.000000 130.00000 129.000000 122.000000