Home > Software engineering >  Extend a column with another column values in python Data frame
Extend a column with another column values in python Data frame

Time:12-07

This is my Data frame df1:

     ID      Date      Value
0   9712    12/15/2021  15
1   9920    4/11/2021   5
2   9712    08/30/2021  10
3   9560    07/3/2021   25
4   9560    03/03/2021  20

My another data frame df2:

     ID    
0   9501    
1   9502    
2   9665    
3   9500    
4   9431    

Now I need to extend the ID column of df1 with ID column of df2. Expected Output:

     ID      Date      Value
0   9712    12/15/2021  15
1   9920    4/11/2021   5
2   9712    08/30/2021  10
3   9560    07/3/2021   25
4   9560    03/03/2021  20
5   9501    
6   9502    
7   9665    
8   9500    
9   9431

How can I achieve it?

CodePudding user response:

You can use concat:

>>> pd.concat([df1,df2])

    ID        Date  Value
0  9712  12/15/2021   15.0
1  9920  04/11/2021    5.0
2  9712  08/30/2021   10.0
3  9560  07/03/2021   25.0
4  9560  03/03/2021   20.0
0  9501         NaN    NaN
1  9502         NaN    NaN
2  9665         NaN    NaN
3  9500         NaN    NaN
4  9431         NaN    NaN

Which will combine the dataframes with overlapping columns and return everything. Columns outside the intersection will be filled with NaN values.


Edit:

Assume your df2 has also a date column:

     ID  Date
0  9501     1
1  9502     2
2  9665     3
3  9500     4
4  9431     5

You can very simply just select [] the Date column to concat:

>>> pd.concat([df,df2[['ID']]])

     ID        Date  Value
0  9712  12/15/2021   15.0
1  9920  04/11/2021    5.0
2  9712  08/30/2021   10.0
3  9560  07/03/2021   25.0
4  9560  03/03/2021   20.0
0  9501         NaN    NaN
1  9502         NaN    NaN
2  9665         NaN    NaN
3  9500         NaN    NaN
4  9431         NaN    NaN

So irrespectively of how many 'extra' columns you have, you can always just select the 'Date' column

CodePudding user response:

You can simply merge the two dataframes on the ID column with an outer join

df = df1.merge(df2, on='ID', how='outer')

if df2 contains more than one column but you wish to only use one, in this case the ID column you can select only that column like so

df = df1.merge(df2[['ID']], on='ID', how='outer')

this method of selecting just the ID column using

df2[['ID']]

will also work for the concat method

  • Related