I want to extract all the digits from each string in a column of strings. I tried using the regular expression below. This works fine to extract some digits. However, it pulls in the characters between noncontiguous digits. It also results in null values from strings with a single digit. The result of the code should look like a column with the values ['123','123','11','2']. What would be the correct regular expression?
import pandas as pd
df = pd.DataFrame(['Ge12ee3ks', '12,3For','11Geeks','2is'])
print(df)
df[0].str.extract('(\\d .*\\d )', expand=True)
CodePudding user response:
You could use a regex approach here and strip off all non-numeric characters:
df["nums"] = df[0].str.replace(r'\D ', '')
CodePudding user response:
You can use findall
then apply join
like below:
>>> df['digits'] = df[0].str.findall('(\\d )').apply(''.join)
>>> df
0 digits
0 Ge12ee3ks 123
1 12,3For 123
2 11Geeks 11
3 2is 2