import pandas as pd
d1_key = "A"
d1 = pd.DataFrame({"A" : [1], "B" : [2]})
d2_key = "B"
d2 = pd.DataFrame({"A" : [3], "B" : [4]})
d3_key = "B"
d3 = pd.DataFrame({"A" : [5], "B" : [6]})
d4_key = "A"
d4 = pd.DataFrame({"A" : [7], "B" : [8]})
I have number of dfs with their keys I concat them based on their keys.
Expected Output:
{'A':
A B
0 1 2
1 7 8,
'B':
A B
0 3 4
1 5 6}
CodePudding user response:
Let us try
df = pd.concat([d1,d2,d3,d4])
df.index = [d1_key,d2_key,d3_key,d4_key]
out = {x : y.reset_index(drop=True) for x, y in df.groupby(level=0)}
out['A']
Out[286]:
A B
0 1 2
1 7 8
CodePudding user response:
easy solution (works with any number of rows):
keys = [d1_key, d2_key, d3_key, d4_key]
dfs = [d1, d2, d3, d4]
out = {k:g.reset_index(drop=True)
for k,g in pd.concat(dfs, keys=keys).groupby(level=0)}
previous answer using a custom Series
Assuming your input dataframes have a single row, you could concat
and groupby
to split:
keys = [d1_key, d2_key, d3_key, d4_key]
dfs = [d1, d2, d3, d4]
out = {k: g.reset_index(drop=True)
for k,g in pd.concat(dfs, ignore_index=True).groupby(pd.Series(keys))}
output:
{'A': A B
0 1 2
1 7 8,
'B': A B
0 3 4
1 5 6}
If your input dataframes have more than one row, you need to account for the length when building the grouping Series:
d1 = pd.DataFrame({"A" : [1,9], "B" : [2,10]})
import numpy as np
group = pd.Series(np.repeat(keys, list(map(len,dfs))))
Example (assuming df1 has 2 rows):
0 A
1 A
2 B
3 B
4 A
dtype: object
Grouping:
group = pd.Series(np.repeat(keys, list(map(len,dfs))))
out = {k: g.reset_index(drop=True)
for k,g in pd.concat(dfs, ignore_index=True).groupby(group)}
output:
{'A': A B
0 1 2
1 9 10
2 7 8,
'B': A B
0 3 4
1 5 6}