Home > database >  Pandas create new column in df2 from another df1 column2 if a value in df1 column1 matches value in
Pandas create new column in df2 from another df1 column2 if a value in df1 column1 matches value in

Time:07-14

I need to compare items in a list to a column in df1 and create new column in df2 with value from a different column on the same row in df1, if there is a match between a list item and the compared df1 column

My code and results

import pandas as pd
  
# initialize list elements
new_data = [[10,'a'], [20, 'b'], [30, 'c'], [40, 'd']]
old_data = [['10a','A', 10.05], [22, 'B', 12.10], [40, 'C', 1.10], 
[20, 'D', 8.05], [15, 'E', 9.00]]
lst = [20, 40, 50, 70, '20a']
  
# Create the pandas DataFrame with column name is provided explicitly
df1 = pd.DataFrame(new_data, columns=['Numbers', 'Letters'])
print(df)
df2 = pd.DataFrame(old_data, columns=['level', 'cap', 'time'])
df2['Exit'] = df1.apply(lambda x: x['Letters'] if \
str(x['Numbers']) in lst else 0, axis=1)
print(df2)

Result

level cap   time  Exit
0   10a   A  10.05   0.0
1    22   B  12.10   0.0
2    40   C   1.10   0.0
3    20   D   8.05   0.0
4    15   E   9.00   NaN

But Expected

level cap   time Exit
0   10a   A  10.05   0
1    22   B  12.10   0
2    40   C   1.10   d
3    20   D   8.05   b
4    15   E   9.00   0

I am missing something?

CodePudding user response:

We can make the Numbers column the index of df1 with set_index then reindex the DataFrame with lst so that only values that appear in the list remain in the DataFrame. Then reindex again with the level column of df2 and to_numpy to assign the column without index alignment:

df2['Exit'] = (
    df1.set_index('Numbers')
        .reindex(index=lst)  # Ensure only values from list are present
        .reindex(index=df2['level'], fill_value=0)  # Re-align to df2['level'] column
        .to_numpy()  # Ignore index alignment
)

df2:

  level cap   time Exit
0   10a   A  10.05    0
1    22   B  12.10    0
2    40   C   1.10    d
3    20   D   8.05    b
4    15   E   9.00    0

Alternatively, we can then join a restructured df1 to df2.

To restructure df1 set_index and reindex with lst to create the subset only using values from lst

(
    df1.set_index('Numbers')
        .reindex(index=lst)
        .rename(columns={'Letters': 'Exit'})
)

        Exit
Numbers     
20         b
40         d
50       NaN
70       NaN
20a      NaN

The rename ensures that the new column appears with the correct name and then fillna to fix the NaN values.

All together this looks like:

df2 = df2.join(
    df1.set_index('Numbers')
        .reindex(index=lst)
        .rename(columns={'Letters': 'Exit'}),
    how='left',
    on='level'
).fillna({'Exit': 0})

df2:

  level cap   time Exit
0   10a   A  10.05    0
1    22   B  12.10    0
2    40   C   1.10    d
3    20   D   8.05    b
4    15   E   9.00    0

Yet another option, creating a mapper from the modified df1 above using Series.to_dict and then mapping the values back to df2 level column:

df2['Exit'] = df2['level'].map(
    df1.set_index('Numbers')['Letters']
        .reindex(index=lst)
        .dropna()
        .to_dict()
).fillna(0)

df2:

  level cap   time Exit
0   10a   A  10.05    0
1    22   B  12.10    0
2    40   C   1.10    d
3    20   D   8.05    b
4    15   E   9.00    0

Setup and version:

import pandas as pd  # version 1.4.3

lst = [20, 40, 50, 70, '20a']

df1 = pd.DataFrame(
    [[10, 'a'], [20, 'b'], [30, 'c'], [40, 'd']],
    columns=['Numbers', 'Letters']
)
df2 = pd.DataFrame(
    [['10a', 'A', 10.05], [22, 'B', 12.10], [40, 'C', 1.10],
     [20, 'D', 8.05], [15, 'E', 9.00]],
    columns=['level', 'cap', 'time']
)
  • Related