Home > Net >  Pandas DataFrame shows cells to be strings, but returns an error when I try to split cells
Pandas DataFrame shows cells to be strings, but returns an error when I try to split cells

Time:12-01

I have a Pandas DataFrame df, with a column df['auc_all'] which contains a tuple with two values (e.g. (0.54, 0.044))

When I run:

type(df['auc_all'][0])
>>> str

Yet, when I run:

def convert_str_into_tuple(self, string):
    splitted_tuple = string.split(',')
    value1 = float(splitted_tuple[0][1:])
    value2 = float(splitted_tuple[1][1:-1])
    return (value1, value2)

df['auc_all'] = df['auc_all'].apply(convert_str_into_tuple)

I get the following error:

df = full_df.create_full()
Traceback (most recent call last):
    
  File "<ipython-input-437-34fc05204bad>", line 18, in create_full
    df['auc_all'] = df['auc_all'].apply(self.convert_str_into_tuple)

  File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\series.py", line 4357, in apply
    return SeriesApply(self, func, convert_dtype, args, kwargs).apply()

  File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\apply.py", line 1043, in apply
    return self.apply_standard()

  File "C:\Users\20200016\Anaconda3\lib\site-packages\pandas\core\apply.py", line 1099, in apply_standard
    mapped = lib.map_infer(

  File "pandas\_libs\lib.pyx", line 2859, in pandas._libs.lib.map_infer

  File "<ipython-input-437-34fc05204bad>", line 63, in convert_str_into_tuple
    splitted_tuple = string.split(',')

AttributeError: 'tuple' object has no attribute 'split'

This seems to indicate that the cell holds a tuple.

However:

df['auc'][0][0]
>>> '('

It seems as if the variable type changes based on where I use it. Is this actually happening?

CodePudding user response:

If your column contains tuples as string, use pd.eval:

df['auc_all'] = pd.eval(df['auc_all'])

Example:

# df = pd.DataFrame({'auc_all': ['(0.54, 0.044)']})
>>> df
         auc_all
0  (0.54, 0.044)

>>> type(df['auc_all'][0])
str


# df['auc_all'] = pd.eval(df['auc_all'])
>>> df
         auc_all
0  [0.54, 0.044]

>>> type(df['auc_all'][0])
list

The drawback is your tuple is converted as a list but you can use literal_eval from ast module:

# import ast
# df['auc_all'] = df['auc_all'].apply(ast.literal_eval)
>>> df
         auc_all
0  (0.54, 0.044)

>>> type(df['auc_all'][0])
tuple
  • Related