I have these two df:
df1 = pd.DataFrame({'List' : ['P111', 'P999', 'P111;P999;P777', 'P555', 'P666;P111;P333'],
'Color' : ['red', 'red', 'blue','yellow', 'red']})
df2 = pd.DataFrame({'Cod' : ['P111', 'P222', 'P333', 'P444', 'P555', 'P666', 'P777'],
'Animal' : ['DOG', 'CAT', 'BUG','SNAKE,DOG', 'CAT,BUG', 'DOG', 'SNAKE'],
'Letter' : ['A,F', 'C', 'S,M', 'F,L', 'C,A','M,C', 'Z,L']})
and at some point I do something like:
...
.groupby('index')
.agg(List=('List','first'),
Sequence=('Sequence','first'),
Animal=('Animal','|'.join),
Letter=('Letter','|'.join))
...
This works but i would like to know if there is a way to insert into .agg() all columns of df1 with 'first' and all columns of df2 with joins. This is because I don't always have the same number of columns, also sometimes I have names that have spaces and this prevents me from inserting them
CodePudding user response:
Yes, there is a better way if you want aggregate on all columns in one go:
...
.astype(dict.fromkeys(df2, str))
.groupby('index')
.agg({**dict.fromkeys(df1, 'first'), **dict.fromkeys(df2, '|'.join)})
...
Some details:
dict.fromkeys(iterable, value)
: creates a new dict with keys fromiterable
and values set tovalue
.- Type casting with
.astype(dict.fromkeys(df2, str))
is required because in case some columns indf2
are numeric type then'|'.join
will not work.