Problem Statement-Create a new categorical variable with value as Open and Closed. Open & Pending is to be categorized as Open and Closed & Solved is to be categorized as Closed.
Description: I have a dataframe where there is a column 'Status' with the following values
0 Closed
1 Closed
2 Closed
3 Open
4 Solved
...
2219 Closed
2220 Solved
2221 Solved
2222 Solved
2223 Open
Now I am supposed to create another column based on the column status as mentioned above but if the value of Status is 'Open' or 'Pending' then the new column 'Final Status' should have a value of 'Open' and similarly if the value of Status is 'Closed' or 'Pending' then the new column 'Final Status' should have a value of 'Closed'. I tried applying the following code to do so but it doesn't work and gives me the following incorrect results. Telecom is the dataframe
for i in Telecom['Status']:
if i =='Open' or i =='Pending':
Telecom['Final Status']= Telecom['Status'].replace(['Open','Pending'],'Open')
elif i =='Closed' or i =='Solved':
Telecom['Final Status']= Telecom['Status'].replace(['Closed','Solved'],'Closed')
The result for the column 'Final Status' is as follows: Status Final Status 0 Closed Closed 1 Closed Closed 2 Closed Closed 3 Open Open 4 Solved Solved
I can't figure out where I am going wrong. Seems its just copying the values from 'Status' and putting it in 'Final Status'
CodePudding user response:
IIUC you can use a np.select()
to get what you are looking for
import numpy as np
condition_list = [(df['Status'] == 'Open') | (df['Status'] == 'Pending'), (df['Closed'] == 'Open') | (df['Solved'] == 'Pending')]
choice_list = ['Open', 'Closed']
df['Final Status'] = np.select(condition_list, choice_list, '')
CodePudding user response:
You could just use a simple map
:
d = {'Open': 'Open', 'Pending': 'Open', 'Closed': 'Closed', 'Solved': 'Closed'}
df['Final Status'] = df['Status'].map(d)
or:
d = {'Pending': 'Open', 'Closed': 'Closed'}
df['Final Status'] = df['Status'].map(lambda x: d.get(x, x))
# or
# df['Final Status'] = df['Status'].map(d).fillna(df['Status'])
output:
Status Final Status
0 Closed Closed
1 Closed Closed
2 Closed Closed
3 Open Open
4 Solved Closed
2219 Closed Closed
2220 Solved Closed
2221 Solved Closed
2222 Solved Closed
2223 Open Open