Home > Net >  Concat Dataframes based on keys In Dictionaries
Concat Dataframes based on keys In Dictionaries

Time:12-14

import pandas as pd
d1_key = "A"
d1 = pd.DataFrame({"A" : [1], "B" : [2]})
d2_key = "B"
d2 = pd.DataFrame({"A" : [3], "B" : [4]})
d3_key = "B"
d3 = pd.DataFrame({"A" : [5], "B" : [6]})
d4_key = "A"
d4 = pd.DataFrame({"A" : [7], "B" : [8]})

I have number of dfs with their keys I concat them based on their keys.

Expected Output:

{'A':    
    A  B
 0  1  2
 1  7  8,
 'B':    
    A  B
 0  3  4
 1  5  6}

CodePudding user response:

Let us try

df = pd.concat([d1,d2,d3,d4])
df.index = [d1_key,d2_key,d3_key,d4_key]
out = {x : y.reset_index(drop=True) for x, y in df.groupby(level=0)}
out['A']
Out[286]: 
   A  B
0  1  2
1  7  8

CodePudding user response:

easy solution (works with any number of rows):

keys = [d1_key, d2_key, d3_key, d4_key]
dfs  = [d1, d2, d3, d4]

out = {k:g.reset_index(drop=True)
       for k,g in pd.concat(dfs, keys=keys).groupby(level=0)}

previous answer using a custom Series

Assuming your input dataframes have a single row, you could concat and groupby to split:

keys = [d1_key, d2_key, d3_key, d4_key]
dfs  = [d1, d2, d3, d4]

out = {k: g.reset_index(drop=True)
       for k,g in pd.concat(dfs, ignore_index=True).groupby(pd.Series(keys))}

output:

{'A':    A  B
 0  1  2
 1  7  8,
 'B':    A  B
 0  3  4
 1  5  6}

If your input dataframes have more than one row, you need to account for the length when building the grouping Series:

d1 = pd.DataFrame({"A" : [1,9], "B" : [2,10]})

import numpy as np
group = pd.Series(np.repeat(keys, list(map(len,dfs))))

Example (assuming df1 has 2 rows):

0    A
1    A
2    B
3    B
4    A
dtype: object

Grouping:

group = pd.Series(np.repeat(keys, list(map(len,dfs))))

out = {k: g.reset_index(drop=True)
       for k,g in pd.concat(dfs, ignore_index=True).groupby(group)}

output:

{'A':    A   B
 0  1   2
 1  9  10
 2  7   8,
 'B':    A  B
 0  3  4
 1  5  6}
  • Related