I have the following statement, if col1="string1" and col2="string3"
then newcol="newstring"
and if col1="stringX"
and col2="stringY"
then newcol="newstringZ"
, how do I frame this into a dictionary so that I can apply map function to use the dictionary to check and insert new column for a dataframe?
df = pd.DataFrame(
{
'col1': ['string1', 'string1', 'string3', 'stringX',
'','stringX'],
'col2': ["stringY","string3","stringY",np.nan,"stringY","stringY"]
}
)
After applying map function, the end result should look like this
CodePudding user response:
Let's use 'merge' with a dataframe constructor:
dict_map = {'col1':['string1','stringX'], 'col2':['string3', 'stringY'], 'newcol':['newstring', 'newstringZ']}
df_map = pd.DataFrame(dict_map)
df.merge(df_map, how='left')
Output:
col1 col2 newcol
0 string1 stringY NaN
1 string1 string3 newstring
2 string3 stringY NaN
3 stringX NaN NaN
4 stringY NaN
5 stringX stringY newstringZ
CodePudding user response:
You can set the dict with tuples of (col1
, col2
) as keys and values for newcol
as values. Then, turn col1
, col2
as a tuple (composite key) by .apply()
on axis=1
. Then use .map()
function for the mapping, as follows:
mapper = {('string1', 'string3') : 'newstring',
('stringX', 'stringY') : 'newstringZ'}
df['newcol'] = df[['col1', 'col2']].apply(tuple, axis=1).map(mapper)
Result:
print(df)
col1 col2 newcol
0 string1 stringY NaN
1 string1 string3 newstring
2 string3 stringY NaN
3 stringX NaN NaN
4 stringY NaN
5 stringX stringY newstringZ