Replace values in pandas dataframe with blank space that start with a string value-CodePudding

I have a large pandas dataframe (15million lines) and I want to replace any value that starts with 'College' and replace it with a blank. I know I could do this with a for loop or 'np.where', but this takes way too long on my large dataframe. I also want to create a 'combined_id' column where I take the student name and the college. I want to skip the ones that don't have a proper college name. What is the fastest way to do this?

original:

   id1  id2            college_name     student    combined id 
0   01   01           Stanford          haley      id/haley_Stanford 
1   01   02           College12         josh       id/josh_College12
2   01   03           Harvard           jake       id/jake_Harvard
2   01   05           UPenn             emily      id/emily_UPenn
2   01   00           College10         sarah      id/sarah_College10

desired:

   id1  id2            college_name     student    combined id 
0   01   01           Stanford          haley      id/haley_Stanford 
1   01   02                             josh       
2   01   03           Harvard           jake       id/jake_Harvard
2   01   05           UPenn             emily      id/emily_UPenn
2   01   00                             sarah

CodePudding user response：

Use boolean indexing:

m = df['college_name'].str.startswith('College')
df.loc[m, 'college_name'] = ''
df.loc[m, 'combined id'] = ''

Or if "combined id" does not exist, you have to use numpy.where:

df['combined id'] = np.where(m, '', 'id/' df['student'] '_' df['college_name'])

CodePudding user response：

Here's a way to get from original to desired in your question:

df.loc[df.college_name.str.startswith("College"), ['college_name', 'combined_id']] = ''

Output:

   id1  id2 college_name student        combined_id
0    1    1     Stanford   haley  id/haley_Stanford
1    1    2                 josh
2    1    3      Harvard    jake    id/jake_Harvard
2    1    5        UPenn   emily     id/emily_UPenn
2    1    0                sarah