Home > database >  Pandas: How to split and read values
Pandas: How to split and read values

Time:09-29

In my dataset, there is a column that has names of the courses taken by a student. It has multiple values

StudentID:S1, Courses: course1, course2, course3

StudentID:S2, Courses: course1, course4, course5

I want to separate these values (course names) and create column for each course separately and assign values 1 or 0 based on the courses taken by a student. How can I do that using Pandas library ?

Any help is appreciated Thank you

CodePudding user response:

header_list = ['StudentID','Courses','Course1', 'Course2']

df = df.reindex(columns = header_list)  

This will include those columns into the dataframe. Including empty columns.

CodePudding user response:

I created the sample dataframe using data that you gave

  StudentID                    Courses
0        S1  course1, course2, course3
1        S2  course1, course4, course5

Split the courses column

df['Courses'] = df.Courses.str.split(', ')

  StudentID                      Courses
0        S1  [course1, course2, course3]
1        S2  [course1, course4, course5]

Explode the courses column

df = df.explode('Courses')

  StudentID  Courses
0        S1  course1
0        S1  course2
0        S1  course3
1        S2  course1
1        S2  course4
1        S2  course5

Use get_dummies to create a separate column for each course

pd.get_dummies(df.explode('Courses'), columns=['Courses'], prefix='', prefix_sep='')

  StudentID  course1  course2  course3  course4  course5
0        S1        1        0        0        0        0
0        S1        0        1        0        0        0
0        S1        0        0        1        0        0
1        S2        1        0        0        0        0
1        S2        0        0        0        1        0
1        S2        0        0        0        0        1
  • Related