Home > Blockchain >  Longest Strings in List in Dataframe
Longest Strings in List in Dataframe

Time:11-15

I want to return the longest string in list in a dataframe. The original data will look like:

items
['pants','hat']
['clothes']
['mouse']

My desired output is:

items               longest_str
['pants','hat']     ['pants']
['clothes']         ['clothes']
['mouse','xx']      ['mouse']

My code is :

def longest(i):
    return max(i,key=len)

for i in range(0:2):
    print(df['items'][i])
    print(longest(df['items'][i]))

but it does not print out what I want. My ultimate goal is to find the longest string in the whole dataframe. Please suggest any other optimal way to approach it

CodePudding user response:

Just call your longest function within df.apply:

In [492]: df['longest_str'] = df['items'].apply(lambda x: [longest(x)])

In [493]: df
Out[493]: 
          items longest_str
0  [pants, hat]     [pants]
1     [clothes]   [clothes]
2       [mouse]     [mouse]

CodePudding user response:

Try:

df = pd.DataFrame({'items':[['pants', 'hat'],['clothes'],['mouse', 'xx']]})

df['longest_str'] = [[max(x, key=len)] for x in df['items']]

df

Output:

          items longest_str
0  [pants, hat]     [pants]
1     [clothes]   [clothes]
2   [mouse, xx]     [mouse]

To finish up your total question:

df = pd.DataFrame({'items':[['pants', 'hat'],['clothes'],['mouse', 'xx']]})

df['longest_str'] = [max(x, key=len) for x in df['items']]

max((x for x in df['longest_str']), key=len)

Output:

'clothes'

Getting really cute:

df['items'].explode().loc[lambda x: x.str.len().max() == x.str.len()].to_numpy()[0]

Output:

'clothes'

CodePudding user response:

Use apply, maybe is better without your function.

df['longest_str'] = df['items'].apply(lambda x: max(x, key=len))
  • Related