Home > Net >  Pandas turn a column with multiple datatypes into a column with one datatype
Pandas turn a column with multiple datatypes into a column with one datatype

Time:03-29

I do have a problem with one column of my dataset. My "Tags" column is an object type in pandas. The Tags are in a list. Now i want to apply a lambda function to get the length of the list. I got following error message:

object of type 'float' has no len()

I analyzed the dataset and found that I have str, float and None types. I queried the None Types in my Lambda function, using an if clause. Now my problem is, I don't know how to unify the other datatypes, that all datatypes are of type List.

I tried the .astype function, but there I get the following error message:

data type 'list' not understood

Maybe someone can provide me an answer :)

Edit:

video_df['tags'].apply(lambda x: 0 if x is None else len(x))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
d:\PythonTutorial\Analysis\analysis.ipynb Cell 54' in <cell line: 1>()
----> 1 video_df['tags'].apply(lambda x: 0 if x is None else len(x))

TypeError: object of type 'float' has no len()

Sample just one single value:

'[\'god of war 3\', \'gow\', \'santa monica studios\', \'sony\', \'msony computer entertainment\', \'ps3\',\'1080p\']'

    ['bauen',
     'insel',
     'instrumente'
]

CodePudding user response:

I see two main options.

  1. Use str.len which works on all iterables (strings, lists, tuples...)
  2. Use a loop and check whether you have instances of lists
df = pd.DataFrame({'col': [1,float('nan'),[],[1,2,3],(1,2),'a']})

# option 1
df['len1'] = df['col'].str.len()

# option 2
df['len2'] = [len(x) if isinstance(x, list) else pd.NA
              for x in df['col']]

Output:

         col  len1  len2
0          1   NaN  <NA>
1        NaN   NaN  <NA>
2         []   0.0     0
3  [1, 2, 3]   3.0     3
4     (1, 2)   2.0  <NA>
5          a   1.0  <NA>

CodePudding user response:

New Answer

@mozway pointed out that df['Tags'].str.len() gracefully handles objects with undefined length!

Old answer

One workaround is to define a custom function to handle the TypeError which arises from objects with no defined length. For example, the following function returns the length of each object in df['Tags'], or -1 if the object has no length:

def get_len(x):
    try:
        return len(x)
    except TypeError:
        return -1

df['Tags'].apply(get_len)
  • Related