Below is the output of the dataframe, im able to publish
df2 = df.groupby(['id1','id2']).size().reset_index()
df2.columns = ['id1','id2','Count']
print (df2)
id1 id2 Count
7780379 00000000-0000-0000-0000-000000000000 1
7780379 72b9f501-6d48-4a37-8f3a-ff4e5fb6ec8d 19
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
im trying to add the condition is any id2 that belongs to 00000000-0000-0000-0000-000000000000 is an invalid count , Below is the expected result
id1 ValidCount InvalidCount
7780379 19 1
Please share if any pointers.
CodePudding user response:
You can get the final output from df
directly, without needs of df2
:
valids = np.where(df['id2']=='00000000-0000-0000-0000-000000000000',
'InvalidCount', 'ValidCount')
df.groupby(['id1', valids]).size().unstack('id2').reset_index()
CodePudding user response:
Use groupby_sum
:
out = df.groupby(['id1', df['id2']!='00000000-0000-0000-0000-000000000000'])['Count'] \
.sum().unstack().rename(columns={True: 'ValidCount', False: 'InvalidCount'})
Output:
>>> out
id2 InvalidCount ValidCount
id1
7780379 1 19