Change data types in Pandas dataframe-CodePudding

I have a csv file that looks like this:

table = {'column1': [1,2,3],
         'column2': ['(0.2, 0.02, NaN)','(0.0, 0.03, 0)','(0.1, NaN, 1)']}
df = pd.DataFrame(table)

I am trying to access to the array that is stored in "column2", however pandas says that "column2" is an object and therefore if I print df['column2'][0][0], I get '(' instead of "0.2".

How can I change the data type from "object" to numeric values?

I tried this pd.to_numeric(df['column2'][0]) but it didn't work.

CodePudding user response：

eval and ast.literal_eval won't work as the string NaN does not mean anything in Python without context (ofcoruse it's np.nan - but the module ast isn't aware of that)

So you can change NaNs to None for a moment, then apply ast.literal_eval or eval then convert Nones to np.nan:

import ast

df['column2'] = df['column2'].str.replace('NaN', 'None').apply(ast.literal_eval).apply(lambda x: tuple(np.nan if val is None else val for val in x))

and

df['column2'] = df['column2'].str.replace('NaN', 'None').apply(eval).apply(lambda x: tuple(np.nan if val is None else val for val in x))

Shorter version would be to replace NaN with np.nan and give it the Numpy module for context:

import numpy as np

df['column2']=df['column2'].str.replace('NaN', 'np.nan').apply(eval)

If you don't want to use the ast module.

In [98]: df['column2'][0][0]
Out[98]: 0.2

In [100]: type(df['column2'][0])
Out[100]: tuple

CodePudding user response：

One option could be to split the values:

df2 = df['column2'].str.strip('()').str.split(',\s*', expand=True).astype(float)

Output:

     0     1    2
0  0.2  0.02  NaN
1  0.0  0.03  0.0
2  0.1   NaN  1.0