Home > Back-end >  Label data using a combination of 2 columns in Pandas dataframe
Label data using a combination of 2 columns in Pandas dataframe

Time:06-22

I have the below table:

df = pd.DataFrame({'Student ID': [123456,789456,101112,131415],
                       'Math ': [3,2,4,2],
                       'Art  ': [3,3,3,1]})

I want to look into the combination of the exams and label each student as follows:

incase the student got:

  • 1 in Math and 2 in Art (or vise versa) then labelled "Usatisfactory"
  • 3 in Math and 3 in Art (or vise versa) then labelled "Good"
  • 3 in Math and 4 in Art (or vise versa) then labelled "Good"
  • 2 in Math and 3 in Art (or vise versa) then labelled "Moderate"
Student ID Math Art combined
123456 3 3 33
789456 2 3 23
101112 4 3 43
131415 2 1 21

What I did, is created the combined column using the following code:

Create a new column and concatenate data

cols = ['Math', 'Art',]
df['combined'] = df[cols].apply(lambda x: ''.join(x.values.astype(str)), axis=1)

convert 'combined' column to integer

df['combined']  = df['combined'] .astype('int')

Create a combination for score

unsat    =[11]
moderate =[12,21,22]
good     =[33,34,43]
excellent=[45,54,55]

so if the score is 11 or (1 for math and 1 for Art) he/she will get unsatisfactory if the student got any combination of 33 or 34 then moderate and so on.

I am unable to know how to use my logic to label my data. I feel that my way is way too complex

CodePudding user response:

Since you allow vice verse, math 1 art 2 is equal to math 2 art 1, which means you can simply use sum:

mapper = {3: "Unsatisfactory",
          6: "Good",
          7: "Good",
          5: "Moderate"}

df["label"] = df["Math"].add(df["Art"]).replace(mapper)
print(df)

Output:

   Student ID  Math  Art           label
0      123456     3    3            Good
1      789456     2    3        Moderate
2      101112     4    3            Good
3      131415     2    1  Unsatisfactory
  • Related