Let's say I have the following data:
df=pd.DataFrame({"id":[1,1,1,2,2,3,4],
"date":[2019,2019,2020,2020,2020,2020,2021],
"subgroup":["con","ind","ind","con","ind","ind","ind"],
"value":[1,None,2,None,1,3,4]})
I want to group by ID and DATE and, among those duplicates, get a column which counts the number of missing values in the Value column based on the value in the Subgroup column (in this case when Subgroup=="Ind") The output would look like this:
id date subgroup value count
1 2019 con 1 1
1 2019 ind None 1
1 2020 ind 2 0
2 2020 con None 0
2 2020 ind 1 0
3 2020 ind 3 0
4 2021 ind 4 0
How can I achieve this?
CodePudding user response:
df['counter'] = 0
df.loc[(df.subgroup=='ind') & (df.value.isna()), 'counter'] = 1
df['goal'] = df.groupby(["id","date"])['counter'].transform('sum')
df = df.drop(columns='counter')
but as Alollz pointed out your sample code does not produce your sample dataframe.
CodePudding user response:
you'll need to located the rows where subgroup == 'ind' and then sum the isnull() in the value column as such:
new_df = df.loc[df['subgroup'] == 'ind']
lst = []
nans = new_df['value'][i].isnull().sum()