I have several Dataframes in a list
df1 = {'col1': ['a', 'a', 'b', 'c', 'c', 'd'], 'col2': [3, 4, 3, 8, 4, 5], 'col3': ['cat', 'cat', 'cat', 'cat', 'cat', 'cat']}
df2 = {'col1': ['a', 'a', 'b', 'c', 'c', 'd'], 'col2': [30, 40, 30, 80, 40, 50], 'col3': ['dog', 'dog', 'dog', 'dog', 'dog', 'dog']}
df3 = {'col1': ['a', 'a', 'b', 'c', 'c', 'd'], 'col2': [31, 41, 31, 81, 41, 51], 'col3': ['whale', 'whale', 'whale', 'whale', 'whale', 'whale']}
dfList = [df1, df2, df3]
What I want is a list of dataframes where, where each new dataframe for each value in col1
, with col2
and col3
as columns. The values of col2
and col3
should come from each dataframe df1
, df2
, df3
.
a
col2 col3
3 cat
30 dog
31 whale
a
col2 col3
4 cat
40 dog
41 whale
b
col2 col3
3 cat
30 dog
31 whale
c
col2 col3
8 cat
80 dog
81 whale
....
I would also be interested if it is possible to have that in one dataframe, something like
col2 col3
a 3 cat
30 dog
31 whale
a 4 cat
40 dog
41 whale
b 3 cat
30 dog
31 whale
c 8 cat
80 dog
81 whale
....
CodePudding user response:
You can use concat
and groupby
:
df = (pd.concat(dfList)
.assign(id=lambda d: d.groupby(['col1', 'col3']).cumcount())
)
out = [d.drop(columns='id') for k,d in df.groupby(['col1', 'id'])]
output:
[ col1 col2 col3
0 a 3 cat
0 a 30 dog
0 a 31 whale,
col1 col2 col3
1 a 4 cat
1 a 40 dog
1 a 41 whale,
col1 col2 col3
2 b 3 cat
2 b 30 dog
2 b 31 whale,
col1 col2 col3
3 c 8 cat
3 c 80 dog
3 c 81 whale,
col1 col2 col3
4 c 4 cat
4 c 40 dog
4 c 41 whale,
col1 col2 col3
5 d 5 cat
5 d 50 dog
5 d 51 whale]
Alternative:
df.set_index(['col1', 'id']).sort_index()
output:
col2 col3
col1 id
a 0 3 cat
0 30 dog
0 31 whale
1 4 cat
1 40 dog
1 41 whale
b 0 3 cat
0 30 dog
0 31 whale
c 0 8 cat
0 80 dog
0 81 whale
1 4 cat
1 40 dog
1 41 whale
d 0 5 cat
0 50 dog
0 51 whale