I have a Pandas DataFrame df
, with a column df['auc_all']
which contains a tuple with two values (e.g. (0.54, 0.044)
)
When I run:
type(df['auc_all'][0])
>>> str
Yet, when I run:
def convert_str_into_tuple(self, string):
splitted_tuple = string.split(',')
value1 = float(splitted_tuple[0][1:])
value2 = float(splitted_tuple[1][1:-1])
return (value1, value2)
df['auc_all'] = df['auc_all'].apply(convert_str_into_tuple)
I get the following error:
df = full_df.create_full()
Traceback (most recent call last):
File "<ipython-input-437-34fc05204bad>", line 18, in create_full
df['auc_all'] = df['auc_all'].apply(self.convert_str_into_tuple)
File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\series.py", line 4357, in apply
return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\apply.py", line 1043, in apply
return self.apply_standard()
File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\apply.py", line 1099, in apply_standard
mapped = lib.map_infer(
File "pandas\_libs\lib.pyx", line 2859, in pandas._libs.lib.map_infer
File "<ipython-input-437-34fc05204bad>", line 63, in convert_str_into_tuple
splitted_tuple = string.split(',')
AttributeError: 'tuple' object has no attribute 'split'
This seems to indicate that the cell holds a tuple.
However:
df['auc'][0][0]
>>> '('
It seems as if the variable type changes based on where I use it. Is this actually happening?
CodePudding user response:
If your column contains tuples as string, use pd.eval
:
df['auc_all'] = pd.eval(df['auc_all'])
Example:
# df = pd.DataFrame({'auc_all': ['(0.54, 0.044)']})
>>> df
auc_all
0 (0.54, 0.044)
>>> type(df['auc_all'][0])
str
# df['auc_all'] = pd.eval(df['auc_all'])
>>> df
auc_all
0 [0.54, 0.044]
>>> type(df['auc_all'][0])
list
The drawback is your tuple is converted as a list but you can use literal_eval
from ast
module:
# import ast
# df['auc_all'] = df['auc_all'].apply(ast.literal_eval)
>>> df
auc_all
0 (0.54, 0.044)
>>> type(df['auc_all'][0])
tuple