Home > Software design >  Pandas: Add column information based on length of another dataframe
Pandas: Add column information based on length of another dataframe

Time:11-04

I have two data frames. I want to add the columns from the second data frame to the first one by matching the column information in data frame 1.

The first data frame: dd

         BCS        libcs
0        AAA-1        1
1        ABA-1        1
2        ACD-1        1
3        AAT-1        1
4        AAR-1        1
           ...      ...
188453  TCC-61       61
188454  TTG-61       61
188455  CTG-61       61
188456  ATG-61       61
188457  TTT-61       61

print(dd.shape)
(188458, 2)

The second dataframe: df

     sample_id       DState
0    H20_AGS          AG
1    H21_AGS          AG
2    H22_BGS          BG
3    H23_AGS          AG
4    H24_CGS          CG
..   ...     ...
56    H90_DGS          DG
57    H91_DGS          DG
58    H92_BGS          BG
59    H93_BGS          BG
60    H94_CGS          CG

print(df.shape)
(61, 2)

There are 61 unique items in column libcs in dataframe1 and 61 rows in the second data frame. I want to merge based on the matching column information.

Expected Output:

         BCS        libcs     sample_id       DState
0        AAA-1        1       H20_AGS          AG
1        ABA-1        1       H20_AGS          AG
2        ACD-1        1       H20_AGS          AG
3        AAT-1        1       H20_AGS          AG
4        AAR-1        1       H20_AGS          AG
           ...      ...
188453  TCC-61       61       H94_CGS          CG
188454  TTG-61       61       H94_CGS          CG
188455  CTG-61       61       H94_CGS          CG
188456  ATG-61       61       H94_CGS          CG
188457  TTT-61       61       H94_CGS          CG

I tried the below code but it seems like it is not working

dd.libcs.value_counts()
dd.index = dd.groupby(level=0).cumcount()
C = dd.join(df).reset_index(drop=True)
print (C)

CodePudding user response:

You can increase the index of df by 1 to match the value of libcs and merge:

df.index  = 1
dd.merge(df, left_on='libcs', right_index=True)

output:

           BCS  libcs sample_id DState
0        AAA-1      1   H20_AGS     AG
1        ABA-1      1   H20_AGS     AG
2        ACD-1      1   H20_AGS     AG
3        AAT-1      1   H20_AGS     AG
4        AAR-1      1   H20_AGS     AG
...
188453  TCC-61     61   H94_CGS     CG
188454  TTG-61     61   H94_CGS     CG
188455  CTG-61     61   H94_CGS     CG
188456  ATG-61     61   H94_CGS     CG
188457  TTT-61     61   H94_CGS     CG

CodePudding user response:

Use merge

>>> dd.merge(df, left_on='libcs', right_on=(df.index 1).astype(str))

      BCS  libcs sample_id DState
0   AAA-1      1   H20_AGS     AG
1   ABA-1      1   H20_AGS     AG
2   ACD-1      1   H20_AGS     AG
3   AAT-1      1   H20_AGS     AG
4   AAR-1      1   H20_AGS     AG
5  TCC-61     61   H94_CGS     CG
6  TTG-61     61   H94_CGS     CG
7  CTG-61     61   H94_CGS     CG
8  ATG-61     61   H94_CGS     CG
9  TTT-61     61   H94_CGS     CG
  • Related