I have two data frames. I want to add the columns from the second data frame to the first one by matching the column information in data frame 1.
The first data frame: dd
BCS libcs
0 AAA-1 1
1 ABA-1 1
2 ACD-1 1
3 AAT-1 1
4 AAR-1 1
... ...
188453 TCC-61 61
188454 TTG-61 61
188455 CTG-61 61
188456 ATG-61 61
188457 TTT-61 61
print(dd.shape)
(188458, 2)
The second dataframe: df
sample_id DState
0 H20_AGS AG
1 H21_AGS AG
2 H22_BGS BG
3 H23_AGS AG
4 H24_CGS CG
.. ... ...
56 H90_DGS DG
57 H91_DGS DG
58 H92_BGS BG
59 H93_BGS BG
60 H94_CGS CG
print(df.shape)
(61, 2)
There are 61 unique items in column libcs
in dataframe1 and 61 rows in the second data frame. I want to merge based on the matching column information.
Expected Output:
BCS libcs sample_id DState
0 AAA-1 1 H20_AGS AG
1 ABA-1 1 H20_AGS AG
2 ACD-1 1 H20_AGS AG
3 AAT-1 1 H20_AGS AG
4 AAR-1 1 H20_AGS AG
... ...
188453 TCC-61 61 H94_CGS CG
188454 TTG-61 61 H94_CGS CG
188455 CTG-61 61 H94_CGS CG
188456 ATG-61 61 H94_CGS CG
188457 TTT-61 61 H94_CGS CG
I tried the below code but it seems like it is not working
dd.libcs.value_counts()
dd.index = dd.groupby(level=0).cumcount()
C = dd.join(df).reset_index(drop=True)
print (C)
CodePudding user response:
You can increase the index of df
by 1 to match the value of libcs
and merge
:
df.index = 1
dd.merge(df, left_on='libcs', right_index=True)
output:
BCS libcs sample_id DState
0 AAA-1 1 H20_AGS AG
1 ABA-1 1 H20_AGS AG
2 ACD-1 1 H20_AGS AG
3 AAT-1 1 H20_AGS AG
4 AAR-1 1 H20_AGS AG
...
188453 TCC-61 61 H94_CGS CG
188454 TTG-61 61 H94_CGS CG
188455 CTG-61 61 H94_CGS CG
188456 ATG-61 61 H94_CGS CG
188457 TTT-61 61 H94_CGS CG
CodePudding user response:
Use merge
>>> dd.merge(df, left_on='libcs', right_on=(df.index 1).astype(str))
BCS libcs sample_id DState
0 AAA-1 1 H20_AGS AG
1 ABA-1 1 H20_AGS AG
2 ACD-1 1 H20_AGS AG
3 AAT-1 1 H20_AGS AG
4 AAR-1 1 H20_AGS AG
5 TCC-61 61 H94_CGS CG
6 TTG-61 61 H94_CGS CG
7 CTG-61 61 H94_CGS CG
8 ATG-61 61 H94_CGS CG
9 TTT-61 61 H94_CGS CG