I have a data frame and I am trying to map one of column values to values present in a set.
Data frame is
Name CallType Location
ABC IN SFO
DEF OUT LHR
PQR INCOMING AMS
XYZ OUTGOING BOM
TYR A_IN DEL
OMN A_OUT DXB
I have a Constant list where Call Type will be replaced by that in the list
call_type = set("IN","OUT")
Desired data frame
Name CallType Location
ABC IN SFO
DEF OUT LHR
PQR IN AMS
XYZ OUT BOM
TYR IN DEL
OMN OUT DXB
I wrote the code to check the response but the process.extractOne gives IN for OUTGOING sometimes (Which is wrong) and sometimes it gives OUT for OUTGOING (Which is right)
Here's is my code
data=[('ABC','IN','SFO),
('DEF','OUT','LHR),
('PQR','INCOMING','AMS),
('XYZ','OUTGOING','BOM),
('TYR','A_IN','DEL),
('OMN','A_OUT','DXB)]
df = pd.DataFrame(data,
columns =['Name', 'CallType',
'Location'])
call_types=set(['IN','OUT'])
df['Call Type'] = df['Call Type'].apply(lambda x: process.extractOne(x, list(call_types))[0])
total_rows=len(df)
for row_no in range(total_rows):
row=df.iloc[row_no]
print(row) // Here Sometimes OUTGOING sets as OUT and Sometimes IN . Shouldn't the result be consistent ?
I am not sure if there is a better way. Can someone please suggest if I am missing something.
CodePudding user response:
Looks like Series.str.extract
is a good fit for this:
df['CallType'] = df.CallType.str.extract(r'(OUT|IN)')
print(df)
Name CallType Location
0 ABC IN SFO
1 DEF OUT LHR
2 PQR IN AMS
3 XYZ OUT BOM
4 TYR IN DEL
5 OMN OUT DXB
Or, if you want to use call_types
explicitly, you can do:
df['CallType'] = df.CallType.str.extract(fr"({'|'.join(call_types)})")
# same result
CodePudding user response:
A possible solution is to use difflib.get_close_matches
:
import difflib
df['CallType'] = df['CallType'].apply(
lambda x: difflib.get_close_matches(x, call_type)[0])
Output:
Name CallType Location
0 ABC IN SFO
1 DEF OUT LHR
2 PQR IN AMS
3 XYZ OUT BOM
4 TYR IN DEL
5 OMN OUT DXB
Another possible solution:
df['CallType'] = np.where(df['CallType'].str.contains('OUT'), 'OUT', 'IN')
Output:
# same