I want to create a third column in my pandas dataframe that is based on cols 1 and 2. They are always matching, but I want to make it so that the third column takes whichever value is available. If I just go off of id1, sometimes it is blank, so the third col will end up being blank as well. I want it so that it will take whichever one isn't blank to create the college name.
Original:
id1 id2
0 ID01 ID01
1 ID03
2 ID07
3 ID08 ID08
Desired:
id1 id2 college_name
0 ID01 ID01 College1
1 ID03 College3
2 ID07 College7
3 ID08 ID08 College8
Also, one thing about this data frame is that I am pretty sure the first 2 columns either are an exact match or one of them is empty. I would like to double-check if there is an instance where id1 and id2 are completely different numbers in the same row. How should I do that?
CodePudding user response:
Backfill values from id2
to id1
. Extract the numbers. Convert to int
then str
.
Given:
id1 id2
0 ID01 ID01
1 NaN ID03
2 ID07 NaN
3 ID08 ID08
Doing:
df['college_name'] = 'College' (df.bfill(axis=1)['id1']
.str.extract('(\d )')
.astype(int)
.astype(str))
Output:
id1 id2 college_name
0 ID01 ID01 College1
1 NaN ID03 College3
2 ID07 NaN College7
3 ID08 ID08 College8
To check for rows where the ids are different:
Given:
id1 id2
0 ID01 ID01
1 NaN ID03
2 ID07 NaN
3 ID08 ID98
Doing:
print(df[df.id1.ne(df.id2) & df.id1.notna() & df.id2.notna()])
Output:
id1 id2
3 ID08 ID98