Pandas - Match List Values to Index within a Dataset Name and List Value to New Column-CodePudding

I have a Pandas dataframe like so:

Dataset                             Volume_ft3
Sonar_Raster_0.tif                  2055
Sonar_Raster_1.tif                  6784
Sonar_Raster_FocalMean_5x5_0.tif    2045
Sonar_Raster_FocalMean_5x5_1.tif    6752

I want to append a new column called "Sonar_Points" that matches values from a list to the dataset based on the unique numerical identifier in the Dataset name.

My list is [5525,4374] and I need to look for the index number in the Dataset name, match it to the list index, then output that value in a new column to where this is the resulting dataframe:

Dataset                             Volume_ft3    Sonar_Points
Sonar_Raster_0.tif                  2055          5525
Sonar_Raster_1.tif                  6784          4374
Sonar_Raster_FocalMean_5x5_0.tif    2045          5525
Sonar_Raster_FocalMean_5x5_1.tif    6752          4374

I've tried the below code, but it doesn't account for datasets with the same index.

df = df.append(pd.DataFrame(Sonar_pts_List, columns=['Sonar_Points']),ignore_index=False)

CodePudding user response：

One way using pandas.Series.str.extract.

Note: this will fail if there is an index that exceeds the length of list.

l = [5525, 4374]

df["Sonar_Points"] =  [l[i] for i in 
                       df["Dataset"].str.extract("_(\d )\.", expand=False).astype(int)]
print(df)

Output:

                            Dataset  Volume_ft3  Sonar_Points
0                Sonar_Raster_0.tif        2055          5525
1                Sonar_Raster_1.tif        6784          4374
2  Sonar_Raster_FocalMean_5x5_0.tif        2045          5525
3  Sonar_Raster_FocalMean_5x5_1.tif        6752          4374

CodePudding user response：

You could extract the dataset identifier into a separate column and then use that to merge the data with the list of Sonar Points values:

df['spi'] = df['Dataset'].str.extract(r'_(\d )\.').astype(int)
df = df.merge(pd.DataFrame(l, columns=['Sonar_Points']), left_on='spi', right_index=True).drop('spi', axis=1).sort_index()

Output:

                            Dataset  Volume_ft3  Sonar_Points
0                Sonar_Raster_0.tif        2055          5525
1                Sonar_Raster_1.tif        6784          4374
2  Sonar_Raster_FocalMean_5x5_0.tif        2045          5525
3  Sonar_Raster_FocalMean_5x5_1.tif        6752          4374