I am trying to print the longest list in a series and display top 10 lengths and the corresponding lists.
I tried to do
df["listoflists"].value_counts()
But it only prints the count of the keys but not the length of the keys i.e. lists. I also tried
print(df["listoflists"].applymap(len).idxmax(axis=1))
But getting an error AttributeError: 'Series' object has no attribute 'applymap'
How can this be solved?
CodePudding user response:
Let's say you're having following dataframe :
values = [['a','a'], ['a','b','b','d','e'],
['a','b','b','a'], ['a','b','c','a'],
['a','b','b'],['a','b','b']
df = pd.DataFrame({'listoflists' :values })
For the longest list, you can try :
max(df.listoflists, key=len)
and for the top n list, you can try (n = 3 in this example) :
df['count'] = df.listoflists.map(len)
df.nlargest(3, ['count'])
CodePudding user response:
I'll show you how to do this on just some random data:
# easy random data
df = pd.DataFrame({'listoflists':[[1,2],[3,4,5],[6,7,8,9]]})
# assign back to DataFrame so we can manipulate
df['len'] = df['listoflists'].apply(len)
# then to get top N:
N = 1
df.sort_values(['len'],ascending=False).groupby(['len']).transform(min).head(N)