I have a Data frame and I want to create a new column that is- if a string exists in a specific column then output that string as the value for the new column plus 3 number of spaces after that.
Example-
In this example I would want to search for the string "Note" and if that string exist in the column note, then put "Note" and what ever is in the next three spaces after that.
Before:
id | partNumber | note |
---|---|---|
1 | a1b33 | apples |
2 | hhgh5667 | banana, Note 55, and pineapples |
3 | hhgh5667 | Note 1A, and blueberries |
4 | 09890ii | blackberries |
After:
id | part_number | note | Note_number |
---|---|---|---|
1 | a1b33 | apples | NA |
2 | hhgh5667 | banana, Note 55, and pineapples | Note 55 |
3 | hhgh5667 | Note 1A, and blueberries | Note 1A |
4 | 09890ii | blackberries | NA |
CodePudding user response:
You can use a regular expression with str.extract
to capture everything from Note to just before the comma.
df['Note_number'] = df.note.str.extract('(Note.*)(?=\,)')
Output
id partNumber note Note_number
0 1 a1b33 apples NaN
1 2 hhgh5667 banana, Note 55, and pineapples Note 55
2 3 hhgh5667 Note 1A, and blueberries Note 1A
3 4 09890ii blackberries NaN