Home > Mobile >  Compute number of missing values by group on another dataframe column based on conditions
Compute number of missing values by group on another dataframe column based on conditions

Time:06-01

Let's say I have the following data:

df=pd.DataFrame({"id":[1,1,1,2,2,3,4],
             "date":[2019,2019,2020,2020,2020,2020,2021],
             "subgroup":["con","ind","ind","con","ind","ind","ind"],
             "value":[1,None,2,None,1,3,4]})

I want to group by ID and DATE and, among those duplicates, get a column which counts the number of missing values in the Value column based on the value in the Subgroup column (in this case when Subgroup=="Ind") The output would look like this:

id      date    subgroup   value  count
1       2019      con      1       1
1       2019      ind      None    1
1       2020      ind      2       0
2       2020      con      None    0
2       2020      ind      1       0
3       2020      ind      3       0
4       2021      ind      4       0

How can I achieve this?

CodePudding user response:

df['counter'] = 0
df.loc[(df.subgroup=='ind') & (df.value.isna()), 'counter'] = 1
df['goal'] = df.groupby(["id","date"])['counter'].transform('sum') 
df = df.drop(columns='counter')

but as Alollz pointed out your sample code does not produce your sample dataframe.

CodePudding user response:

you'll need to located the rows where subgroup == 'ind' and then sum the isnull() in the value column as such:

new_df = df.loc[df['subgroup'] == 'ind']
lst = []
nans = new_df['value'][i].isnull().sum()
  • Related