Home > other >  Fill Value in a Column Based on multiple conditions in Another Columns (Python)
Fill Value in a Column Based on multiple conditions in Another Columns (Python)

Time:06-30

I'm newbie in python, I have a problem with deciding which value should I take for a new column in a dataframe that I will create. Here's the requirements:

  1. The value should have length=17
  2. The value should contains "MP" and the format is XXMPXXXXXXXXXXXX

And here's the data:

Serial Number New Serial Number Keyword Serial Number Old
12MP3221156732243 12MP3221156732243 Restaurant 12MP3221156732243
0 Retail 12MP3251453730827 3251453730827
0 K312MP3251773832657 3251773832657
11MP3221156732243 11MP3221156732243 MP3221156732243
11MP3251156732267 0 MP3251156732267

And here's the expected output:

Serial Number New Serial Number Keyword Serial Number Old Serial Number Final
12MP3221156732243 12MP3221156732243 Restaurant 12MP3221156732243 12MP3221156732243
0 Retail 12MP3251453730827 3251453730827 12MP3251453730827
0 K312MP3251773832657 3251773832657 12MP3251773832657
11MP3221156732243 11MP3221156732243 MP3221156732243 11MP3221156732243
11MP3251156732267 0 MP3251156732267 11MP3251156732267

Does anyone know how to get the "Serial Number Final" value? Thank you in advance guys

CodePudding user response:

You can use a regex for that (..MP.{13}) (2 characters, MP, 13 characters):

df['Serial Number Final'] = df['Serial Number   Keyword'].str.extract(r'(..MP.{13})')

or, if the x can only be digits \d\dMP\d{13} (2 digits, MP, 13 digits):

df['Serial Number Final'] = df['Serial Number   Keyword'].str.extract(r'(\d\dMP\d{13})')

output:

   Serial Number New       Serial Number   Keyword  Serial Number Old Serial Number Final
0  12MP3221156732243  12MP3221156732243 Restaurant  12MP3221156732243   12MP3221156732243
1                  0      Retail 12MP3251453730827      3251453730827   12MP3251453730827
2                  0           K312MP3251773832657      3251773832657   12MP3251773832657
3  11MP3221156732243             11MP3221156732243    MP3221156732243   11MP3221156732243

use several columns (pick first match):

cols = ['Serial Number New', 'Serial Number   Keyword']

df['Serial Number Final'] = (df[cols]
 .apply(lambda s: s.str.extract(r'(\d\dMP\d{13})', expand=False))
 .bfill().iloc[:, 0]
)

output:

   Serial Number New       Serial Number   Keyword  Serial Number Old Serial Number Final
0  12MP3221156732243  12MP3221156732243 Restaurant  12MP3221156732243   12MP3221156732243
1                  0      Retail 12MP3251453730827      3251453730827   11MP3221156732243
2                  0           K312MP3251773832657      3251773832657   11MP3221156732243
3  11MP3221156732243             11MP3221156732243    MP3221156732243   11MP3221156732243
4  11MP3251156732267                             0    MP3251156732267   11MP3251156732267
  • Related