Home > Net >  groupby and agg with multiple columns pandas
groupby and agg with multiple columns pandas

Time:09-04

I have these two df:

df1 = pd.DataFrame({'List' : ['P111', 'P999', 'P111;P999;P777', 'P555', 'P666;P111;P333'],
                    'Color' : ['red', 'red', 'blue','yellow', 'red']})

df2 = pd.DataFrame({'Cod' : ['P111', 'P222', 'P333', 'P444', 'P555', 'P666', 'P777'],
                    'Animal' : ['DOG', 'CAT', 'BUG','SNAKE,DOG', 'CAT,BUG', 'DOG', 'SNAKE'],
                    'Letter' : ['A,F', 'C', 'S,M', 'F,L', 'C,A','M,C', 'Z,L']})

and at some point I do something like:

...
.groupby('index')
.agg(List=('List','first'),
     Sequence=('Sequence','first'),
     Animal=('Animal','|'.join),
     Letter=('Letter','|'.join))
...

This works but i would like to know if there is a way to insert into .agg() all columns of df1 with 'first' and all columns of df2 with joins. This is because I don't always have the same number of columns, also sometimes I have names that have spaces and this prevents me from inserting them

CodePudding user response:

Yes, there is a better way if you want aggregate on all columns in one go:

...
.astype(dict.fromkeys(df2, str))
.groupby('index')
.agg({**dict.fromkeys(df1, 'first'), **dict.fromkeys(df2, '|'.join)})
...

Some details:

  • dict.fromkeys(iterable, value): creates a new dict with keys from iterable and values set to value.
  • Type casting with .astype(dict.fromkeys(df2, str)) is required because in case some columns in df2 are numeric type then '|'.join will not work.
  • Related