I need to extract numeric values from a string inside a pandas DataFrame.
Let's say the DataFrame cell is as follows (stored as a string):
[1.234,2.345]
I can get the first value with the following:
print(df['column_name'].str.extract('(\d .\d )',).astype('float'))
Output:
1.234
Now my thoughts to find both values was to do the following:
print(df['column_name'].str.extract('(\d .\d ),(\d .\d )',).astype('float'))
but the output is then as follows:
NaN NaN
Expected output:
1.234 2.345
CodePudding user response:
Why not just pd.eval
:
>>> df['Float'] = pd.eval(df['String'])
>>> df
String Float
0 [1.234, 2.345] [1.234, 2.345]
1 [1.234, 2.345] [1.234, 2.345]
>>>
CodePudding user response:
If you want to use a regex to extract floats, you can use str.findall
:
>>> df['column_name'].str.findall(r'(-?\d \.?\d )').str.join(' ')
0 1.234 2.345
Name: String, dtype: object
Old answer:
Use ast.literal_eval
:
import ast
df = pd.DataFrame({'String': ['[1.234, 2.345]']})
df['Float'] = df['String'].apply(ast.literal_eval)
Output:
>>> df
String Float
0 [1.234, 2.345] [1.234, 2.345]
>>> type(df.at[0, 'String'][0])
str
>>> type(df.at[0, 'Float'][0])
float
CodePudding user response:
You can use pandas.str.split, setting n=2
. If you want to expand the DataFrame you must set expand=True
.
So the result might look like:
your_dataframe['your_column_name'].str.split(",", n=2, expand=True).astype(float)