I want to return the longest string in list in a dataframe. The original data will look like:
items
['pants','hat']
['clothes']
['mouse']
My desired output is:
items longest_str
['pants','hat'] ['pants']
['clothes'] ['clothes']
['mouse','xx'] ['mouse']
My code is :
def longest(i):
return max(i,key=len)
for i in range(0:2):
print(df['items'][i])
print(longest(df['items'][i]))
but it does not print out what I want. My ultimate goal is to find the longest string in the whole dataframe. Please suggest any other optimal way to approach it
CodePudding user response:
Just call your longest
function within df.apply
:
In [492]: df['longest_str'] = df['items'].apply(lambda x: [longest(x)])
In [493]: df
Out[493]:
items longest_str
0 [pants, hat] [pants]
1 [clothes] [clothes]
2 [mouse] [mouse]
CodePudding user response:
Try:
df = pd.DataFrame({'items':[['pants', 'hat'],['clothes'],['mouse', 'xx']]})
df['longest_str'] = [[max(x, key=len)] for x in df['items']]
df
Output:
items longest_str
0 [pants, hat] [pants]
1 [clothes] [clothes]
2 [mouse, xx] [mouse]
To finish up your total question:
df = pd.DataFrame({'items':[['pants', 'hat'],['clothes'],['mouse', 'xx']]})
df['longest_str'] = [max(x, key=len) for x in df['items']]
max((x for x in df['longest_str']), key=len)
Output:
'clothes'
Getting really cute:
df['items'].explode().loc[lambda x: x.str.len().max() == x.str.len()].to_numpy()[0]
Output:
'clothes'
CodePudding user response:
Use apply, maybe is better without your function.
df['longest_str'] = df['items'].apply(lambda x: max(x, key=len))