Is there any way to iterate a list in a dataframe and classify according to the values in the list-CodePudding

Date    Transaction Id  ClientId    Services                                        Class
01-10-2021  1234          1       ['Lip Threading' , 'Eye brow threading']          Threading
02-10-2021  1235          2       ['Full face Threading', 'Eye Brow threading']     Threading
03-10-2021  2346          3       ['Eyebrow Threading' , 'Facial' , 'waxing']       Thread and oth
04-10-2021  5432          4       ['Hair cut' , 'Facial']                           Other
05-10-2021  6578          5       ['Eye brow threading' , 'Haircut', 'facial']      Thread and oth
06-10-2021  3425          6       ['Head Massage', ' hair cut']                     Other

I have dataframe with the above data and there is column called services which has different sercvices as a list. based on the list i want to classify the class column, my main goal is to classify transaction which has threading only, threading with other service, other services without threading.

CodePudding user response：

Using apply to derive Class column from Services

def classify(lst):
    '''
    Classify Type of list
    '''
    threading = "Threading" if any("threading" in el.lower() for el in lst) else ""
    other = "Other" if any("threading" not in el.lower() for el in lst) else ""
    
    if threading and other:
        return "Threading and Other"
    if threading:
        return "Threading"
    return "Other"


# Derive Class column from Services column
df['Class'] = df.Services.apply(classify)

Output

    Transaction Id  ClientId    Services    Class
0   01-10-2021  1234    1   ['Lip Threading' , 'Eye brow threading']    Threading
1   02-10-2021  1235    2   ['Full face Threading', 'Eye Brow threading']   Threading
2   03-10-2021  2346    3   ['Eyebrow Threading' , 'Facial' , 'waxing'] Threading and Other
3   04-10-2021  5432    4   ['Hair cut' , 'Facial'] Other
4   05-10-2021  6578    5   [Eye brow threading' , 'Haircut', 'facial'] Threading and Other
5   06-10-2021  3425    6   ['Head Massage', ' hair cut']   Other

Complete Code

from io import StringIO
import pandas as pd

def classify(lst):
    threading = "Threading" if any("threading" in el.lower() for el in lst) else ""
    other = "Other" if any("threading" not in el.lower() for el in lst) else ""
    
    if threading and other:
        return "Threading and Other"
    if threading:
        return "Threading"
    return "Other"

# Derive Dataframe
s = '''Transaction,Id,ClientId,Services
01-10-2021 ,1234,1,"['Lip Threading' , 'Eye brow threading']"
02-10-2021,1235,2,"['Full face Threading', 'Eye Brow threading']"
03-10-2021,2346,3,"['Eyebrow Threading' , 'Facial' , 'waxing']"
04-10-2021,5432,4,"['Hair cut' , 'Facial']"
05-10-2021,6578,5,"'Eye brow threading' , 'Haircut', 'facial']"
06-10-2021,3425,6,"['Head Massage', ' hair cut']"'''

df = pd.read_csv(StringIO(s), sep = ",", quotechar='"')

# Convert Services column to lists
df['Services'] = df.Services.apply(lambda x: x[1:-1].split(','))

# Derive Class column
df['Class'] = df.Services.apply(classify)

CodePudding user response：

It can be done with np.select and apply.

Example Dataframe:

    colA    colB
0   1       [A, B, C]
1   2       [A]
2   3       [B, C]
3   4       [A, C]
4   5       [B]

conditions

conditions = [
    df["colB"].apply(lambda x: ("A" in x) and len(x)==1 ),
    df["colB"].apply(lambda x: ("A" in x) and len(x)!=1)
]

create column

df["Result"] = np.select(conditions,["A","A and others"], default="Others")

Final dataframe:

    colA    colB        colC
0   1       [A, B, C]   A and others
1   2       [A]         A
2   3       [B, C]      Others
3   4       [A, C]      A and others
4   5       [B]         Others