I have a DataFrame like this:
df = pd.DataFrame({
'Names':[['John','Stefan'], ['Stacy','Jennifer'], ['Paul','Sean', 'Alu']],
})
What I would like to do is to create a new column with the longest word present in a list from column "Names". Also, in case there are 2 or more words with the same largest number of char in them, I would like to return both.
So the output should look like this:
| Names | Output |
| ----------------- | ------------|
| [John, Stefan] | Stefan |
| [Stacy, Jennifer] | Jennifer |
| [Paul, Sean, Alu] | Paul, Sean |
I know that for a single list one can do maybe something like this:
sorted = sorted(my_list, key=len)
largest_element = sorted[-1]
But how to iterate in case of a list in a DataFrame column and how to extract more than 1 largest element in case there is a tie in the number of max char?
Does anybody know?
CodePudding user response:
Try:
def get_max(x):
m = len(max(x, key=len))
return ', '.join(w for w in x if len(w) == m)
df['Output'] = df['Names'].apply(get_max)
print(df)
Prints:
Names Output
0 [John, Stefan] Stefan
1 [Stacy, Jennifer] Jennifer
2 [Paul, Sean, Alu] Paul, Sean
CodePudding user response:
You can write a function and apply it to every row.
def get_largest(names_list):
sorted_list = sorted(names_list, key=len)
largest_word = sorted_list[-1]
longest_length = len(largest_word)
largest_words = [word for word in names_list if len(word)==longest_length]
return largest_words
df = pd.DataFrame({'Names': [['John', 'Stefan'], ['Stacy', 'Jennifer'], ['Paul', 'Sean', 'Alu']]})
df['Output'] = df['Names'].apply(get_largest)
CodePudding user response:
You can use the apply method on the DataFrame column, passing a lambda function that sorts the list by length, finds the last element(s) with the maximum length and returns them.
One-liner:
df['Output'] = df['Names'].apply(lambda x: [i for i in sorted(x, key=len, reverse=True) if len(i) == len(sorted(x, key=len, reverse=True)[0])])
Long but readable:
def get_largest_words(words_list):
sorted_list = sorted(words_list, key=len, reverse=True)
max_length = len(sorted_list[0])
largest_words = [word for word in sorted_list if len(word) == max_length]
return largest_words
df['Output'] = df['Names'].apply(get_largest_words)
if you don't want a list but a comma separated string just add ', '.join(THE_LIST_HERE)