Home > Blockchain >  Join a list of DataFrame
Join a list of DataFrame

Time:12-24

I've a list of Pandas DataFrame, each df has a column named sub_id and a column data_*. The column data_* is specific for each df: the first has data_0, the second has data_1, etc. The schema is this:

[[`sub_id`,`data_0`], [`sub_id`,`data_1`], [`sub_id`,`data_2`],...[`sub_id`,`data_10`]]

Inside the first df I have:

sub_id  data_0
1       1232456
2       4567
5       98786
11      8643
.       .
.       .

Inside the second df I have:

sub_id  data_1
4       1265474
5       4567
20      23434
13      8675443
.       .
.       .

Inside the third df I have:

sub_id  data_2
11      1265474
1       4567
20      23434
5       8675443
.       .
.       .

I've used concat to have an unique df with all data but the result is not what I want.

What I want is a df like this:

sub_id  data_0  data_1  data_2  ..
1       1232456         4567
2       4567
4               1265474
5       98786   4567    8675443
11      8643            1265474
13              8675443
20              23434   23434
.       .
.       .

How I can do this?

CodePudding user response:

Use concat with create DataFrame.set_index:

dfs = [df1, df2, df3]
df = pd.concat([x.set_index('sub_id') for x in dfs], axis=1)
print (df)
           data_0     data_1     data_2
sub_id                                 
1       1232456.0        NaN     4567.0
2          4567.0        NaN        NaN
4             NaN  1265474.0        NaN
5         98786.0     4567.0  8675443.0
11         8643.0        NaN  1265474.0
13            NaN  8675443.0        NaN
20            NaN    23434.0    23434.0
  • Related