I've a list of Pandas DataFrame, each df has a column named sub_id
and a column data_*
. The column data_*
is specific for each df: the first has data_0
, the second has data_1
, etc.
The schema is this:
[[`sub_id`,`data_0`], [`sub_id`,`data_1`], [`sub_id`,`data_2`],...[`sub_id`,`data_10`]]
Inside the first df I have:
sub_id data_0
1 1232456
2 4567
5 98786
11 8643
. .
. .
Inside the second df I have:
sub_id data_1
4 1265474
5 4567
20 23434
13 8675443
. .
. .
Inside the third df I have:
sub_id data_2
11 1265474
1 4567
20 23434
5 8675443
. .
. .
I've used concat to have an unique df with all data but the result is not what I want.
What I want is a df like this:
sub_id data_0 data_1 data_2 ..
1 1232456 4567
2 4567
4 1265474
5 98786 4567 8675443
11 8643 1265474
13 8675443
20 23434 23434
. .
. .
How I can do this?
CodePudding user response:
Use concat
with create DataFrame.set_index
:
dfs = [df1, df2, df3]
df = pd.concat([x.set_index('sub_id') for x in dfs], axis=1)
print (df)
data_0 data_1 data_2
sub_id
1 1232456.0 NaN 4567.0
2 4567.0 NaN NaN
4 NaN 1265474.0 NaN
5 98786.0 4567.0 8675443.0
11 8643.0 NaN 1265474.0
13 NaN 8675443.0 NaN
20 NaN 23434.0 23434.0