I have an excel file where two columns are combined into one. It looks something like this:
Student Number Name
Site: Springfield elementary NAN
54634 Harry
34564 Margery
23321 Ron
Site: Springfield Middle School Nan
23123 Julia
23153 Don
The school site is within the student number column and then shows each student number at the school site. I want to take out and remove each site from the student number column and place them in their own separate column. It would look something like this:
Student Number Name School
54634 Harry Site: Springfield elementary
34564 Margery Site: Springfield elementary
23321 Ron Site: Springfield elementary
23123 Julia Site: Springfield Middle School
23153 Don Site: Springfield Middle School
I have struggled with this for hours now and any help would be greatly appreciated!
CodePudding user response:
Given:
Student Number Name
0 Site: Springfield elementary NAN
1 54634 Harry
2 34564 Margery
3 23321 Ron
4 Site: Springfield Middle School Nan
5 23123 Julia
6 23153 Don
Doing:
# Create a boolean mask marking values, there are many ways to do this:
mask = df['Student Number'].str.isnumeric()
# Use the mask to make a new column, only taking from rows that aren't numeric.
df.loc[~mask, 'School'] = df['Student Number']
# Front Fill the values~
df.School = df.School.ffill()
# Only keep rows that were part of your mask~
df = df[mask]
Output:
Student Number Name School
1 54634 Harry Site: Springfield elementary
2 34564 Margery Site: Springfield elementary
3 23321 Ron Site: Springfield elementary
5 23123 Julia Site: Springfield Middle School
6 23153 Don Site: Springfield Middle School