Date Transaction Id ClientId Services Class
01-10-2021 1234 1 ['Lip Threading' , 'Eye brow threading'] Threading
02-10-2021 1235 2 ['Full face Threading', 'Eye Brow threading'] Threading
03-10-2021 2346 3 ['Eyebrow Threading' , 'Facial' , 'waxing'] Thread and oth
04-10-2021 5432 4 ['Hair cut' , 'Facial'] Other
05-10-2021 6578 5 ['Eye brow threading' , 'Haircut', 'facial'] Thread and oth
06-10-2021 3425 6 ['Head Massage', ' hair cut'] Other
I have dataframe with the above data and there is column called services which has different sercvices as a list. based on the list i want to classify the class column, my main goal is to classify transaction which has threading only, threading with other service, other services without threading.
CodePudding user response:
Using apply to derive Class column from Services
def classify(lst):
'''
Classify Type of list
'''
threading = "Threading" if any("threading" in el.lower() for el in lst) else ""
other = "Other" if any("threading" not in el.lower() for el in lst) else ""
if threading and other:
return "Threading and Other"
if threading:
return "Threading"
return "Other"
# Derive Class column from Services column
df['Class'] = df.Services.apply(classify)
Output
Transaction Id ClientId Services Class
0 01-10-2021 1234 1 ['Lip Threading' , 'Eye brow threading'] Threading
1 02-10-2021 1235 2 ['Full face Threading', 'Eye Brow threading'] Threading
2 03-10-2021 2346 3 ['Eyebrow Threading' , 'Facial' , 'waxing'] Threading and Other
3 04-10-2021 5432 4 ['Hair cut' , 'Facial'] Other
4 05-10-2021 6578 5 [Eye brow threading' , 'Haircut', 'facial'] Threading and Other
5 06-10-2021 3425 6 ['Head Massage', ' hair cut'] Other
Complete Code
from io import StringIO
import pandas as pd
def classify(lst):
threading = "Threading" if any("threading" in el.lower() for el in lst) else ""
other = "Other" if any("threading" not in el.lower() for el in lst) else ""
if threading and other:
return "Threading and Other"
if threading:
return "Threading"
return "Other"
# Derive Dataframe
s = '''Transaction,Id,ClientId,Services
01-10-2021 ,1234,1,"['Lip Threading' , 'Eye brow threading']"
02-10-2021,1235,2,"['Full face Threading', 'Eye Brow threading']"
03-10-2021,2346,3,"['Eyebrow Threading' , 'Facial' , 'waxing']"
04-10-2021,5432,4,"['Hair cut' , 'Facial']"
05-10-2021,6578,5,"'Eye brow threading' , 'Haircut', 'facial']"
06-10-2021,3425,6,"['Head Massage', ' hair cut']"'''
df = pd.read_csv(StringIO(s), sep = ",", quotechar='"')
# Convert Services column to lists
df['Services'] = df.Services.apply(lambda x: x[1:-1].split(','))
# Derive Class column
df['Class'] = df.Services.apply(classify)
CodePudding user response:
It can be done with np.select
and apply
.
Example Dataframe:
colA colB
0 1 [A, B, C]
1 2 [A]
2 3 [B, C]
3 4 [A, C]
4 5 [B]
conditions
conditions = [
df["colB"].apply(lambda x: ("A" in x) and len(x)==1 ),
df["colB"].apply(lambda x: ("A" in x) and len(x)!=1)
]
create column
df["Result"] = np.select(conditions,["A","A and others"], default="Others")
Final dataframe:
colA colB colC
0 1 [A, B, C] A and others
1 2 [A] A
2 3 [B, C] Others
3 4 [A, C] A and others
4 5 [B] Others