This is an example of a bigger data. Imagine I have a dataframe like this:
df = pd.DataFrame({"CLASS":["AG_1","AG_2","AG_3","MAR","GOM"],
"TOP":[200, np.nan, np.nan, 600, np.nan],
"BOT":[230, 250, 380, np.nan, 640]})
df
Out[49]:
CLASS TOP BOT
0 AG_1 200.0 230.0
1 AG_2 NaN 250.0
2 AG_3 NaN 380.0
3 MAR 600.0 NaN
4 GOM NaN 640.0
I would like to set the values for TOP on lines 1 and 2. My condition is that these values must be the BOT values from the row above if the class begins with the same substring "AG". The output should be like this:
CLASS TOP BOT
0 AG_1 200.0 230.0
1 AG_2 230.0 250.0
2 AG_3 250.0 380.0
3 MAR 600.0 NaN
4 GOM NaN 640.0
Anyone could show me how to do that?
CodePudding user response:
generic case: filling all groups
I would use fillna
with a groupby.shift
using a custom group extracting the substring from CLASS with str.extract
:
group = df['CLASS'].str.extract('([^_] )', expand=False)
df['TOP'] = df['TOP'].fillna(df.groupby(group)['BOT'].shift())
Output:
CLASS TOP BOT
0 AG_1 200.0 230.0
1 AG_2 230.0 250.0
2 AG_3 250.0 380.0
3 MAR 600.0 NaN
4 GOM NaN 640.0
Intermediate group
:
0 AG
1 AG
2 AG
3 MAR
4 GOM
Name: CLASS, dtype: object
special case: only AG group
m = df['CLASS'].str.startswith('AG')
df.loc[m, 'TOP'] = df.loc[m, 'TOP'].fillna(df.loc[m, 'BOT'].shift())
Example:
CLASS TOP BOT
0 AG_1 200.0 230.0
1 AG_2 230.0 250.0
2 AG_3 250.0 380.0
3 MAR_1 600.0 601.0
4 MAR_2 NaN NaN # this is not filled
5 GOM NaN 640.0