So i have a dataset looks like:
Col A | Col B | Col C |
---|---|---|
Nanana | Teacher | 123456 |
Popopo | Student | 232322 |
Cecece | Teacher | 455433 |
Lalala | Student | 231231 |
Poasls | Teacher | 235433 |
Lilili | Teacher | 723543 |
I wanted to replace COL A When
Col B = Teacher AND LEFT(COL C,1) = 2/4
I wanted to replace it if teacher and number = 2/4 Then Teacher2 I wanted to replace it if teacher and number = 1 Then Teacher Else Existing data keep.
LEFT(COL B,1) = Substring of the first number in COL C.
expected output will be
Col A | Col B | Col C |
---|---|---|
Teacher | Teacher | 123456 |
Popopo | Student | 232322 |
Teacher2 | Teacher | 455433 |
Lalala | Student | 231231 |
Teacher2 | Teacher | 235433 |
Lilili | Teacher | 723543 |
CodePudding user response:
Use numpy.select
with masks:
# filter Teacher in Col B
m1 = df['Col B'].eq('Teacher')
#first letter 2 or 4 in Col C
s = df['Col C'].astype(str).str[0]
m2 = s.isin(['2','4'])
#first letter 1
m3 = s.eq('1')
df['Col A'] = np.select([m1 & m2, m1 & m3], ['Teacher2','Teacher'], default=df['Col A'])
print (df)
Col A Col B Col C
0 Teacher Teacher 123456
1 Popopo Student 232322
2 Teacher2 Teacher 455433
3 Lalala Student 231231
4 Teacher2 Teacher 235433
5 Lilili Teacher 723543