Given Strings:
57 years, 67 daysApr 30, 1789
61 years, 125 daysMar 4, 1797
57 years, 325 daysMar 4, 1801
57 years, 353 daysMar 4, 1809
58 years, 310 daysMar 4, 1817
In regex101:
Pattern = (?P<Years>[\d]{1,2}) years, (?P<Days>[\d]{1,3}) days(?P<Month>[\w]{3} [\d]{1,2}), (?P<Year>[\d]{4})
Output: Output of Regex Pattern
In Python(IDE : Jupyter Notebook) : Python Output Here it is showing only nan values in dataframe, how to solve this ?
CodePudding user response:
Use:
#Preparing data
string = """57 years, 67 daysApr 30, 1789
61 years, 125 daysMar 4, 1797
57 years, 325 daysMar 4, 1801
57 years, 353 daysMar 4, 1809
58 years, 310 daysMar 4, 1817"""
df = pd.DataFrame(string.split('\n'))
#Solution
temp = df[0].str.extractall('(?P<Years>[\d]{1,2}) years, (?P<Days>[\d]{1,3}) days(?P<Month>[\w]{3} [\d]{1,2}), (?P<Year>[\d]{4})')
Output:
Years Days Month Year
match
0 0 57 67 Apr 30 1789
1 0 61 125 Mar 4 1797
2 0 57 325 Mar 4 1801
3 0 57 353 Mar 4 1809
4 0 58 310 Mar 4 1817
CodePudding user response:
FYI, you code ran perfectly for me, maybe you have some whitespace issues in your dataframe:
import pandas as pd
import numpy as np
from io import StringIO
st = StringIO("""57 years, 67 daysApr 30, 1789
61 years, 125 daysMar 4, 1797
57 years, 325 daysMar 4, 1801
57 years, 353 daysMar 4, 1809
58 years, 310 daysMar 4, 1817""")
df = pd.read_csv(st, sep='\s\s\s ', header=None, engine='python')
Pattern = '(?P<Years>[\d]{1,2}) years, (?P<Days>[\d]{1,3}) days(?P<Month>[\w]{3} [\d]{1,2}), (?P<Year>[\d]{4})'
df[0].str.extract(Pattern)
Output:
Years Days Month Year
0 57 67 Apr 30 1789
1 61 125 Mar 4 1797
2 57 325 Mar 4 1801
3 57 353 Mar 4 1809
4 58 310 Mar 4 1817