I have a csv file that looks like this:
table = {'column1': [1,2,3],
'column2': ['(0.2, 0.02, NaN)','(0.0, 0.03, 0)','(0.1, NaN, 1)']}
df = pd.DataFrame(table)
I am trying to access to the array that is stored in "column2", however pandas says that "column2" is an object and therefore if I print df['column2'][0][0], I get '(' instead of "0.2".
How can I change the data type from "object" to numeric values?
I tried this
pd.to_numeric(df['column2'][0])
but it didn't work.
CodePudding user response:
eval
and ast.literal_eval
won't work as the string NaN
does not mean anything in Python without context (ofcoruse it's np.nan
- but the module ast
isn't aware of that)
So you can change NaN
s to None
for a moment, then apply ast.literal_eval
or eval
then convert None
s to np.nan
:
import ast
df['column2'] = df['column2'].str.replace('NaN', 'None').apply(ast.literal_eval).apply(lambda x: tuple(np.nan if val is None else val for val in x))
and
df['column2'] = df['column2'].str.replace('NaN', 'None').apply(eval).apply(lambda x: tuple(np.nan if val is None else val for val in x))
Shorter version would be to replace NaN
with np.nan
and give it the Numpy module for context:
import numpy as np
df['column2']=df['column2'].str.replace('NaN', 'np.nan').apply(eval)
If you don't want to use the ast
module.
In [98]: df['column2'][0][0]
Out[98]: 0.2
In [100]: type(df['column2'][0])
Out[100]: tuple
CodePudding user response:
One option could be to split
the values:
df2 = df['column2'].str.strip('()').str.split(',\s*', expand=True).astype(float)
Output:
0 1 2
0 0.2 0.02 NaN
1 0.0 0.03 0.0
2 0.1 NaN 1.0