I do have a problem with one column of my dataset. My "Tags" column is an object type in pandas. The Tags are in a list. Now i want to apply a lambda function to get the length of the list. I got following error message:
object of type 'float' has no len()
I analyzed the dataset and found that I have str, float and None types. I queried the None Types in my Lambda function, using an if clause. Now my problem is, I don't know how to unify the other datatypes, that all datatypes are of type List.
I tried the .astype function, but there I get the following error message:
data type 'list' not understood
Maybe someone can provide me an answer :)
Edit:
video_df['tags'].apply(lambda x: 0 if x is None else len(x))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
d:\PythonTutorial\Analysis\analysis.ipynb Cell 54' in <cell line: 1>()
----> 1 video_df['tags'].apply(lambda x: 0 if x is None else len(x))
TypeError: object of type 'float' has no len()
Sample just one single value:
'[\'god of war 3\', \'gow\', \'santa monica studios\', \'sony\', \'msony computer entertainment\', \'ps3\',\'1080p\']'
['bauen',
'insel',
'instrumente'
]
CodePudding user response:
I see two main options.
- Use
str.len
which works on all iterables (strings, lists, tuples...) - Use a loop and check whether you have instances of lists
df = pd.DataFrame({'col': [1,float('nan'),[],[1,2,3],(1,2),'a']})
# option 1
df['len1'] = df['col'].str.len()
# option 2
df['len2'] = [len(x) if isinstance(x, list) else pd.NA
for x in df['col']]
Output:
col len1 len2
0 1 NaN <NA>
1 NaN NaN <NA>
2 [] 0.0 0
3 [1, 2, 3] 3.0 3
4 (1, 2) 2.0 <NA>
5 a 1.0 <NA>
CodePudding user response:
New Answer
@mozway pointed out that df['Tags'].str.len()
gracefully handles objects with undefined length!
Old answer
One workaround is to define a custom function to handle the TypeError
which arises from objects with no defined length. For example, the following function returns the length of each object in df['Tags']
, or -1 if the object has no length:
def get_len(x):
try:
return len(x)
except TypeError:
return -1
df['Tags'].apply(get_len)