Group by dataframe and split to columns-CodePudding

Below is the output of the dataframe, im able to publish

df2 = df.groupby(['id1','id2']).size().reset_index()
df2.columns = ['id1','id2','Count']
print (df2)

  id1                         id2                  Count
   7780379  00000000-0000-0000-0000-000000000000      1
   7780379  72b9f501-6d48-4a37-8f3a-ff4e5fb6ec8d     19

im trying to add the condition is any id2 that belongs to 00000000-0000-0000-0000-000000000000 is an invalid count , Below is the expected result

     id1    ValidCount  InvalidCount
     7780379      19         1

Please share if any pointers.

CodePudding user response：

You can get the final output from df directly, without needs of df2:

valids = np.where(df['id2']=='00000000-0000-0000-0000-000000000000', 
                  'InvalidCount', 'ValidCount')
df.groupby(['id1', valids]).size().unstack('id2').reset_index()

CodePudding user response：

Use groupby_sum:

out = df.groupby(['id1', df['id2']!='00000000-0000-0000-0000-000000000000'])['Count'] \
        .sum().unstack().rename(columns={True: 'ValidCount', False: 'InvalidCount'})

Output:

>>> out
id2      InvalidCount  ValidCount
id1                              
7780379             1          19