I have a dataframe df
whos columns contain lists of strings
df = A B
['-1'] , ['0','1','2']
['2','4','3'], ['2']
['3','8'] , ['-1']
I want to get the length of all the lists except the ones that are ['-1']
for the lists that are ['-1']
I want them to be -1
Expected output:
df = A B
-1, 3
3, 1
2, -1
I've tried
df.apply(lambda x: x.str.len() if not x == ['-1'] else -1)
and got the error ('Lengths must match to compare', (132,), (1,))
I have also tried
data_copy[colBeliefs] = data_copy[colBeliefs].apply(lambda x: x.str.len() if '-1' not in x else -1)
but this produces the wrong output where ['-1']
becomes 1
rather than -1
I'm not sure how I can apply functions to a dataframe based on the whether an entry in a dataframe is equal to a list, or whether an item is in a list.
EDIT: Output of df.head().to_dict()
{'A': {0: ['-1'],
1: ['2','4','3'],
2: ['3','8']},
'B': {0: ['0','1','2'],
1: ['2'],
2: ['-1']}}
CodePudding user response:
You could do:
df.applymap(lambda x: -1 if (ln:=len(x)) == 1 and x[0] == '-1' else ln)
A B
0 -1 3
1 3 1
2 2 -1
Edit:
If yousing python < 3.8 Use the following:
df.applymap(lambda x: -1 if len(x) == 1 and x[0] == '-1' else len(x))
CodePudding user response:
The comparison doesn't work for lists, but it works for tuples. So you could convert them to tuples to do avoid the ValueError:
s = df.stack()
out = s.str.len().mask(pd.Series(map(tuple, s), index=s.index)==('-1',), -1).unstack()
Output:
A B
0 -1 3
1 3 1
2 2 -1