Home > Mobile >  Is there any way to iterate a list in a dataframe and classify according to the values in the list
Is there any way to iterate a list in a dataframe and classify according to the values in the list

Time:10-14

Date    Transaction Id  ClientId    Services                                        Class
01-10-2021  1234          1       ['Lip Threading' , 'Eye brow threading']          Threading
02-10-2021  1235          2       ['Full face Threading', 'Eye Brow threading']     Threading
03-10-2021  2346          3       ['Eyebrow Threading' , 'Facial' , 'waxing']       Thread and oth
04-10-2021  5432          4       ['Hair cut' , 'Facial']                           Other
05-10-2021  6578          5       ['Eye brow threading' , 'Haircut', 'facial']      Thread and oth
06-10-2021  3425          6       ['Head Massage', ' hair cut']                     Other

I have dataframe with the above data and there is column called services which has different sercvices as a list. based on the list i want to classify the class column, my main goal is to classify transaction which has threading only, threading with other service, other services without threading.

CodePudding user response:

Using apply to derive Class column from Services

def classify(lst):
    '''
    Classify Type of list
    '''
    threading = "Threading" if any("threading" in el.lower() for el in lst) else ""
    other = "Other" if any("threading" not in el.lower() for el in lst) else ""
    
    if threading and other:
        return "Threading and Other"
    if threading:
        return "Threading"
    return "Other"


# Derive Class column from Services column
df['Class'] = df.Services.apply(classify)

Output

    Transaction Id  ClientId    Services    Class
0   01-10-2021  1234    1   ['Lip Threading' , 'Eye brow threading']    Threading
1   02-10-2021  1235    2   ['Full face Threading', 'Eye Brow threading']   Threading
2   03-10-2021  2346    3   ['Eyebrow Threading' , 'Facial' , 'waxing'] Threading and Other
3   04-10-2021  5432    4   ['Hair cut' , 'Facial'] Other
4   05-10-2021  6578    5   [Eye brow threading' , 'Haircut', 'facial'] Threading and Other
5   06-10-2021  3425    6   ['Head Massage', ' hair cut']   Other

Complete Code

from io import StringIO
import pandas as pd

def classify(lst):
    threading = "Threading" if any("threading" in el.lower() for el in lst) else ""
    other = "Other" if any("threading" not in el.lower() for el in lst) else ""
    
    if threading and other:
        return "Threading and Other"
    if threading:
        return "Threading"
    return "Other"

# Derive Dataframe
s = '''Transaction,Id,ClientId,Services
01-10-2021 ,1234,1,"['Lip Threading' , 'Eye brow threading']"
02-10-2021,1235,2,"['Full face Threading', 'Eye Brow threading']"
03-10-2021,2346,3,"['Eyebrow Threading' , 'Facial' , 'waxing']"
04-10-2021,5432,4,"['Hair cut' , 'Facial']"
05-10-2021,6578,5,"'Eye brow threading' , 'Haircut', 'facial']"
06-10-2021,3425,6,"['Head Massage', ' hair cut']"'''

df = pd.read_csv(StringIO(s), sep = ",", quotechar='"')

# Convert Services column to lists
df['Services'] = df.Services.apply(lambda x: x[1:-1].split(','))

# Derive Class column
df['Class'] = df.Services.apply(classify)

CodePudding user response:

It can be done with np.select and apply.

Example Dataframe:

    colA    colB
0   1       [A, B, C]
1   2       [A]
2   3       [B, C]
3   4       [A, C]
4   5       [B]

conditions

conditions = [
    df["colB"].apply(lambda x: ("A" in x) and len(x)==1 ),
    df["colB"].apply(lambda x: ("A" in x) and len(x)!=1)
]

create column

df["Result"] = np.select(conditions,["A","A and others"], default="Others")

Final dataframe:

    colA    colB        colC
0   1       [A, B, C]   A and others
1   2       [A]         A
2   3       [B, C]      Others
3   4       [A, C]      A and others
4   5       [B]         Others
  • Related