Home > Enterprise >  How to set new values to row based on the same substring from other column?
How to set new values to row based on the same substring from other column?

Time:12-01

This is an example of a bigger data. Imagine I have a dataframe like this:

df = pd.DataFrame({"CLASS":["AG_1","AG_2","AG_3","MAR","GOM"],
                   "TOP":[200, np.nan, np.nan, 600, np.nan],
                   "BOT":[230, 250, 380, np.nan, 640]})

df
Out[49]: 
  CLASS    TOP    BOT
0  AG_1  200.0  230.0
1  AG_2    NaN  250.0
2  AG_3    NaN  380.0
3   MAR  600.0    NaN
4   GOM    NaN  640.0

I would like to set the values for TOP on lines 1 and 2. My condition is that these values must be the BOT values from the row above if the class begins with the same substring "AG". The output should be like this:

  CLASS    TOP    BOT
0  AG_1  200.0  230.0
1  AG_2  230.0  250.0
2  AG_3  250.0  380.0
3   MAR  600.0    NaN
4   GOM    NaN  640.0

Anyone could show me how to do that?

CodePudding user response:

generic case: filling all groups

I would use fillna with a groupby.shift using a custom group extracting the substring from CLASS with str.extract:

group = df['CLASS'].str.extract('([^_] )', expand=False)
df['TOP'] = df['TOP'].fillna(df.groupby(group)['BOT'].shift())

Output:

  CLASS    TOP    BOT
0  AG_1  200.0  230.0
1  AG_2  230.0  250.0
2  AG_3  250.0  380.0
3   MAR  600.0    NaN
4   GOM    NaN  640.0

Intermediate group:

0     AG
1     AG
2     AG
3    MAR
4    GOM
Name: CLASS, dtype: object

special case: only AG group

m = df['CLASS'].str.startswith('AG')

df.loc[m, 'TOP'] = df.loc[m, 'TOP'].fillna(df.loc[m, 'BOT'].shift())

Example:

   CLASS    TOP    BOT
0   AG_1  200.0  230.0
1   AG_2  230.0  250.0
2   AG_3  250.0  380.0
3  MAR_1  600.0  601.0
4  MAR_2    NaN    NaN # this is not filled
5    GOM    NaN  640.0
  • Related