I need to compare items in a list to a column in df1 and create new column in df2 with value from a different column on the same row in df1, if there is a match between a list item and the compared df1 column
My code and results
import pandas as pd
# initialize list elements
new_data = [[10,'a'], [20, 'b'], [30, 'c'], [40, 'd']]
old_data = [['10a','A', 10.05], [22, 'B', 12.10], [40, 'C', 1.10],
[20, 'D', 8.05], [15, 'E', 9.00]]
lst = [20, 40, 50, 70, '20a']
# Create the pandas DataFrame with column name is provided explicitly
df1 = pd.DataFrame(new_data, columns=['Numbers', 'Letters'])
print(df)
df2 = pd.DataFrame(old_data, columns=['level', 'cap', 'time'])
df2['Exit'] = df1.apply(lambda x: x['Letters'] if \
str(x['Numbers']) in lst else 0, axis=1)
print(df2)
Result
level cap time Exit
0 10a A 10.05 0.0
1 22 B 12.10 0.0
2 40 C 1.10 0.0
3 20 D 8.05 0.0
4 15 E 9.00 NaN
But Expected
level cap time Exit
0 10a A 10.05 0
1 22 B 12.10 0
2 40 C 1.10 d
3 20 D 8.05 b
4 15 E 9.00 0
I am missing something?
CodePudding user response:
We can make the Numbers column the index of df1
with set_index then reindex the DataFrame with lst
so that only values that appear in the list remain in the DataFrame. Then reindex again with the level column of df2
and to_numpy to assign the column without index alignment:
df2['Exit'] = (
df1.set_index('Numbers')
.reindex(index=lst) # Ensure only values from list are present
.reindex(index=df2['level'], fill_value=0) # Re-align to df2['level'] column
.to_numpy() # Ignore index alignment
)
df2
:
level cap time Exit
0 10a A 10.05 0
1 22 B 12.10 0
2 40 C 1.10 d
3 20 D 8.05 b
4 15 E 9.00 0
Alternatively, we can then join a restructured df1
to df2
.
To restructure df1
set_index and reindex with lst
to create the subset only using values from lst
(
df1.set_index('Numbers')
.reindex(index=lst)
.rename(columns={'Letters': 'Exit'})
)
Exit
Numbers
20 b
40 d
50 NaN
70 NaN
20a NaN
The rename ensures that the new column appears with the correct name and then fillna to fix the NaN values.
All together this looks like:
df2 = df2.join(
df1.set_index('Numbers')
.reindex(index=lst)
.rename(columns={'Letters': 'Exit'}),
how='left',
on='level'
).fillna({'Exit': 0})
df2
:
level cap time Exit
0 10a A 10.05 0
1 22 B 12.10 0
2 40 C 1.10 d
3 20 D 8.05 b
4 15 E 9.00 0
Yet another option, creating a mapper from the modified df1
above using Series.to_dict and then mapping the values back to df2
level column:
df2['Exit'] = df2['level'].map(
df1.set_index('Numbers')['Letters']
.reindex(index=lst)
.dropna()
.to_dict()
).fillna(0)
df2
:
level cap time Exit
0 10a A 10.05 0
1 22 B 12.10 0
2 40 C 1.10 d
3 20 D 8.05 b
4 15 E 9.00 0
Setup and version:
import pandas as pd # version 1.4.3
lst = [20, 40, 50, 70, '20a']
df1 = pd.DataFrame(
[[10, 'a'], [20, 'b'], [30, 'c'], [40, 'd']],
columns=['Numbers', 'Letters']
)
df2 = pd.DataFrame(
[['10a', 'A', 10.05], [22, 'B', 12.10], [40, 'C', 1.10],
[20, 'D', 8.05], [15, 'E', 9.00]],
columns=['level', 'cap', 'time']
)