I'm newbie in python, I have a problem with deciding which value should I take for a new column in a dataframe that I will create. Here's the requirements:
- The value should have length=17
- The value should contains "MP" and the format is XXMPXXXXXXXXXXXX
And here's the data:
Serial Number New | Serial Number Keyword | Serial Number Old |
---|---|---|
12MP3221156732243 | 12MP3221156732243 Restaurant | 12MP3221156732243 |
0 | Retail 12MP3251453730827 | 3251453730827 |
0 | K312MP3251773832657 | 3251773832657 |
11MP3221156732243 | 11MP3221156732243 | MP3221156732243 |
11MP3251156732267 | 0 | MP3251156732267 |
And here's the expected output:
Serial Number New | Serial Number Keyword | Serial Number Old | Serial Number Final |
---|---|---|---|
12MP3221156732243 | 12MP3221156732243 Restaurant | 12MP3221156732243 | 12MP3221156732243 |
0 | Retail 12MP3251453730827 | 3251453730827 | 12MP3251453730827 |
0 | K312MP3251773832657 | 3251773832657 | 12MP3251773832657 |
11MP3221156732243 | 11MP3221156732243 | MP3221156732243 | 11MP3221156732243 |
11MP3251156732267 | 0 | MP3251156732267 | 11MP3251156732267 |
Does anyone know how to get the "Serial Number Final" value? Thank you in advance guys
CodePudding user response:
You can use a regex for that (..MP.{13})
(2 characters, MP, 13 characters):
df['Serial Number Final'] = df['Serial Number Keyword'].str.extract(r'(..MP.{13})')
or, if the x
can only be digits \d\dMP\d{13}
(2 digits, MP, 13 digits):
df['Serial Number Final'] = df['Serial Number Keyword'].str.extract(r'(\d\dMP\d{13})')
output:
Serial Number New Serial Number Keyword Serial Number Old Serial Number Final
0 12MP3221156732243 12MP3221156732243 Restaurant 12MP3221156732243 12MP3221156732243
1 0 Retail 12MP3251453730827 3251453730827 12MP3251453730827
2 0 K312MP3251773832657 3251773832657 12MP3251773832657
3 11MP3221156732243 11MP3221156732243 MP3221156732243 11MP3221156732243
use several columns (pick first match):
cols = ['Serial Number New', 'Serial Number Keyword']
df['Serial Number Final'] = (df[cols]
.apply(lambda s: s.str.extract(r'(\d\dMP\d{13})', expand=False))
.bfill().iloc[:, 0]
)
output:
Serial Number New Serial Number Keyword Serial Number Old Serial Number Final
0 12MP3221156732243 12MP3221156732243 Restaurant 12MP3221156732243 12MP3221156732243
1 0 Retail 12MP3251453730827 3251453730827 11MP3221156732243
2 0 K312MP3251773832657 3251773832657 11MP3221156732243
3 11MP3221156732243 11MP3221156732243 MP3221156732243 11MP3221156732243
4 11MP3251156732267 0 MP3251156732267 11MP3251156732267