I have multiple rasters in a specific directory from which I need to extract band1 values (chlorophyll concentration) using a CSV containg the coordinates of the points of interest.
This is the CSV (read as GeoDataFrame):
point_id point_name latitude longitude geometry
0 1 'Forte dei Marmi' 10.2427 43.5703 POINT (10.24270 43.57030)
1 2 'La Spezia' 9.9030 44.0341 POINT (9.90300 44.03410)
2 3 'Orbetello' 11.2029 42.4488 POINT (11.20290 42.44880)
3 4 'Portoferraio' 10.3328 42.8080 POINT (10.33280 42.80800)
4 5 'Fregene' 12.1990 41.7080 POINT (12.19900 41.70800)
All the rasters I need to sample are in raster_dir = 'C:/sentinel_3_processing/'
My final purpose is to have a dataframe with as much columns as raster in the folder.
The samlpling of all the rasters is working, the output is correct but I need it to be different. As I explained before.
The output I got is:
[[10.2427, 43.5703, 0.63],
[10.2427, 43.5703, 0.94],
[10.2427, 43.5703, 0.76],
[10.2427, 43.5703, 0.76],
[10.2427, 43.5703, 1.03],
[10.2427, 43.5703, 0.86],
[10.2427, 43.5703, 0.74],
[10.2427, 43.5703, 1.71],
[10.2427, 43.5703, 3.07],,
[...],
[12.199, 41.708, 0.96],
[12.199, 41.708, 0.89],
[12.199, 41.708, 1.29],
[12.199, 41.708, 0.24],
[12.199, 41.708, 1.59],
[12.199, 41.708, 1.78],
[12.199, 41.708, 0.39],
[12.199, 41.708, 1.54],
[12.199, 41.708, 1.62]]
But I need something like that:
[
[10.2427, 43.5703, 0.63, 0.94, 0.76, 0.76, 1.03, 0.86, 0.74, 1.71, 3.07],
[...],
[12.199, 41.708, 0.96, 0.89, 1.29, 0.24, 1.59, 1.78, 0.39, 1.54, 1.62]]
]
Now I'll show you the code I wrote:
L = [] # final list that contains the other lists
for p in csv_gdf['geometry']: # for all the point contained in the dataframe...
for files in os.listdir(raster_dir): #...and for all the rasters in that folder...
if files[-4:] == '.img': #...which extention is .img...
r = rio.open(raster_dir '\\' files) # open the raster
list_row = []
# read the raster band1 values at those coordinates...
x = p.xy[0][0]
y = p.xy[1][0]
row, col = r.index(x, y)
chl_value = r.read(1)[row, col]
# append to list_row the coordinates ad then the raster value.
list_row.append(p.xy[0][0])
list_row.append(p.xy[1][0])
list_row.append(round(float(chl_value), 2))
# then, append all the lists created in the loop to the final list
L.append(list_row)
Could you please help me? Every piece of advice is widely appreciated! Thank you in advance! Hope your guys are ok!
CodePudding user response:
Try this,
data = [[10.2427, 43.5703, 0.63],
[10.2427, 43.5703, 0.94],
[10.2427, 43.5703, 0.76],
[10.2427, 43.5703, 0.76],
[10.2427, 43.5703, 1.03],
[10.2427, 43.5703, 0.86],
[10.2427, 43.5703, 0.74],
[10.2427, 43.5703, 1.71],
[10.2427, 43.5703, 3.07],
[12.199, 41.708, 0.96],
[12.199, 41.708, 0.89],
[12.199, 41.708, 1.29],
[12.199, 41.708, 0.24],
[12.199, 41.708, 1.59],
[12.199, 41.708, 1.78],
[12.199, 41.708, 0.39],
[12.199, 41.708, 1.54],
[12.199, 41.708, 1.62]]
df = pd.DataFrame(data)
print(df.groupby([0, 1])[2].apply(list).reset_index().apply(lambda x: [x[0], x[1]] x[2], axis=1).values.tolist())
Explanation:
- Create dataframe out of your current output
- groupby first two cols and get other elements as list
- Restructure to get the expected output
O/P:
[[10.2427, 43.5703, 0.63, 0.94, 0.76, 0.76, 1.03, 0.86, 0.74, 1.71, 3.07], [12.199, 41.708, 0.96, 0.89, 1.29, 0.24, 1.59, 1.78, 0.39, 1.54, 1.62]]
Note: The above code is just to give you an idea, it can be further improved. If I get some free time, I will post that as well.