I have this dataframe
ID | product name |
---|---|
1BJM10 | 1BJM10_RS2022_PK |
L_RS2022_PK | |
2PKL10_RS2022_PK | |
3BDG10_RS2022_PK | |
1BJM10 | 1BJM10_RS2022_PK |
My desired output is like this
ID | product name |
---|---|
1BJM10 | 1BJM10_RS2022_PK |
- | L_RS2022_PK |
2PKL10 | 2PKL10_RS2022_PK |
3BDG10 | 3BDG10_RS2022_PK |
1BJM10 | 1BJM10_RS2022_PK |
2nd row shouldn't get the ID because is has "_" in the product name's first 6 characters.
I have tried this code, but it doesn't work
df.loc[df['ID'].isna()] = df['ID'].fillna(~df['product name'].str[:6].contains("_"))
CodePudding user response:
Chain both conditions by &
for bitwise AND
with helper Series
:
s = df['product name'].str[:6]
df.loc[df['ID'].isna() & ~s.str.contains("_"), 'ID'] = s
print (df)
ID product name
0 1BJM10 1BJM10_RS2022_PK
1 NaN L_RS2022_PK
2 2PKL10 2PKL10_RS2022_PK
3 3BDG10 3BDG10_RS2022_PK
4 1BJM10 1BJM10_RS2022_PK
CodePudding user response:
Try:
df['ID'] = df['product name'].apply(lambda x: x[:x.find('_')] if x.find('_')>=6 else '')