I'm new to pandas and would like to know how to do the following: Given specific conditions, I would like to mark the whole group with a specific label rather than just the rows that meet the conditions. For example, if I have a DataFrame like this:
import numpy as np
import pandas as pd
df = pd.DataFrame({"id": [1, 2, 3, 4, 5, 6, 7, 8],
"process": ["pending", "finished", "finished", "finished", "finished", "finished", "finished", "pending"],
"working_group": ["a", "a", "c", "d", "d", "f", "g", "g"],
"size": [2, 2, 1, 2, 2, 1, 2, 2]})
conditions = [(df['size'] >= 2) & (df['process'].isin(["pending"]))]
choices = ["not_done"]
df['state'] = df['state'] = np.select(conditions, choices, default = "something_else")
df:
id process working_group size state
0 1 pending a 2 not_done
1 2 finished a 2 something_else
2 3 finished c 1 something_else
3 4 finished d 2 something_else
4 5 finished d 2 something_else
5 6 finished f 1 something_else
6 7 finished g 2 something_else
7 8 pending g 2 not_done
However I would like the whole working_group marked as not_done when a individual task is pending, so for instance a & g should be marked as not_done.
id process working_group size state
0 1 pending a 2 not_done
1 2 finished a 2 not_done
2 3 finished c 1 something_else
3 4 finished d 2 something_else
4 5 finished d 2 something_else
5 6 finished f 1 something_else
6 7 finished g 2 not_done
7 8 pending g 2 not_done
CodePudding user response:
You can use:
condition = df['size'].ge(2) & df['process'].isin(["pending"])
df['state'] = np.where(condition.groupby(df['working_group']).transform('any'), 'not_done', 'something_else')
Or:
condition = df['size'].ge(2) & df['process'].isin(["pending"])
df['state'] = np.where(df['working_group'].isin(df.loc[condition, 'working_group']), 'not_done', 'something_else')
Output:
id process working_group size state
0 1 pending a 2 not_done
1 2 finished a 2 not_done
2 3 finished c 1 something_else
3 4 finished d 2 something_else
4 5 finished d 2 something_else
5 6 finished f 1 something_else
6 7 finished g 2 not_done
7 8 pending g 2 not_done
CodePudding user response:
A simple solution would be after you use np.select
and create your 'state' column, to forward fill / backward fill per group?
df['state'] = df.groupby(['working_group'])['state'].transform(lambda x: x.bfill().ffill())
id process working_group size state
0 1 pending a 2 not_done
1 2 finished a 2 not_done
2 3 finished c 1 NaN
3 4 finished d 2 NaN
4 5 finished d 2 NaN
5 6 finished f 1 NaN
6 7 finished g 2 not_done
7 8 pending g 2 not_done