Home > Software design >  Separating a column that stores two different data types
Separating a column that stores two different data types

Time:09-08

I have an excel file where two columns are combined into one. It looks something like this:

Student Number                       Name
Site: Springfield elementary         NAN
54634                                Harry
34564                                Margery
23321                                Ron
Site: Springfield Middle School      Nan
23123                                Julia
23153                                Don

The school site is within the student number column and then shows each student number at the school site. I want to take out and remove each site from the student number column and place them in their own separate column. It would look something like this:

Student Number                       Name        School
54634                                Harry       Site: Springfield elementary 
34564                                Margery     Site: Springfield elementary 
23321                                Ron         Site: Springfield elementary 
23123                                Julia       Site: Springfield Middle School
23153                                Don         Site: Springfield Middle School

I have struggled with this for hours now and any help would be greatly appreciated!

CodePudding user response:

Given:

                    Student Number     Name
0     Site: Springfield elementary      NAN
1                            54634    Harry
2                            34564  Margery
3                            23321      Ron
4  Site: Springfield Middle School      Nan
5                            23123    Julia
6                            23153      Don

Doing:

# Create a boolean mask marking values, there are many ways to do this:
mask = df['Student Number'].str.isnumeric()

# Use the mask to make a new column, only taking from rows that aren't numeric.
df.loc[~mask, 'School'] = df['Student Number']

# Front Fill the values~
df.School = df.School.ffill()

# Only keep rows that were part of your mask~
df = df[mask]

Output:

  Student Number     Name                           School
1          54634    Harry     Site: Springfield elementary
2          34564  Margery     Site: Springfield elementary
3          23321      Ron     Site: Springfield elementary
5          23123    Julia  Site: Springfield Middle School
6          23153      Don  Site: Springfield Middle School
  • Related