I have a Pandas dataframe like so:
Dataset Volume_ft3
Sonar_Raster_0.tif 2055
Sonar_Raster_1.tif 6784
Sonar_Raster_FocalMean_5x5_0.tif 2045
Sonar_Raster_FocalMean_5x5_1.tif 6752
I want to append a new column called "Sonar_Points" that matches values from a list to the dataset based on the unique numerical identifier in the Dataset name.
My list is [5525,4374] and I need to look for the index number in the Dataset name, match it to the list index, then output that value in a new column to where this is the resulting dataframe:
Dataset Volume_ft3 Sonar_Points
Sonar_Raster_0.tif 2055 5525
Sonar_Raster_1.tif 6784 4374
Sonar_Raster_FocalMean_5x5_0.tif 2045 5525
Sonar_Raster_FocalMean_5x5_1.tif 6752 4374
I've tried the below code, but it doesn't account for datasets with the same index.
df = df.append(pd.DataFrame(Sonar_pts_List, columns=['Sonar_Points']),ignore_index=False)
CodePudding user response:
One way using pandas.Series.str.extract
.
Note: this will fail if there is an index that exceeds the length of list.
l = [5525, 4374]
df["Sonar_Points"] = [l[i] for i in
df["Dataset"].str.extract("_(\d )\.", expand=False).astype(int)]
print(df)
Output:
Dataset Volume_ft3 Sonar_Points
0 Sonar_Raster_0.tif 2055 5525
1 Sonar_Raster_1.tif 6784 4374
2 Sonar_Raster_FocalMean_5x5_0.tif 2045 5525
3 Sonar_Raster_FocalMean_5x5_1.tif 6752 4374
CodePudding user response:
You could extract the dataset identifier into a separate column and then use that to merge
the data with the list of Sonar Points values:
df['spi'] = df['Dataset'].str.extract(r'_(\d )\.').astype(int)
df = df.merge(pd.DataFrame(l, columns=['Sonar_Points']), left_on='spi', right_index=True).drop('spi', axis=1).sort_index()
Output:
Dataset Volume_ft3 Sonar_Points
0 Sonar_Raster_0.tif 2055 5525
1 Sonar_Raster_1.tif 6784 4374
2 Sonar_Raster_FocalMean_5x5_0.tif 2045 5525
3 Sonar_Raster_FocalMean_5x5_1.tif 6752 4374