Pandas - Get max length of lists per day-CodePudding

Imagine you have the following data:

               date          items
0              2022-03-04    [6, 7]
1              2022-03-04    [4, 8]
1              2022-03-04    [4, 8]
1              2022-03-04    [4, 8]
1              2022-03-04    [4, 8]
...                   ...       ...
1082           2022-03-03         1
1082           2022-03-03         1
1082           2022-03-03         1
1083           2022-03-11         1
1083           2022-03-11         1

I'd like to reach a state where I have the max length of items per day

                date         len
0               2022-03-04     2
2               2022-03-05     3
...                    ...   ...
1083            2022-03-11     1

etc. I dont care for duplicates, I just want the top list sizes by day in the end.

I've tried the following:

df["len"] = max(df.items, key=len)

but this seems to compute not per row, but rather on the column? as I get the following issue:

*** ValueError: Length of values (3) does not match length of index (4002)

Appreciate any assistance you can give

CodePudding user response：

Use Series.str.len with aggregate max, if values are not in lists get misisng values in df.items.str.len():

df = df.items.str.len().groupby(df['date']).max().reset_index(name='len')

If there is integers and need count like lenghts replace missing values by fillna:

df = (df.items.str.len().fillna(df['items'])
              .groupby(df['date'])
              .max()
              .reset_index(name='len'))

EDIT: If need row with maximal length of items:

print (df)
         date        items
0  2022-03-04        [6,7]
1  2022-03-04          [4]
1  2022-03-05  [4,8,4,4,7]
1  2022-03-05      [4,8,4]
1  2022-03-06    [4,8,5,7]


df = df.reset_index(drop=True)

df['len'] = df['items'].str.len().fillna(df['items'])
df = df.loc[df.groupby('date')['len'].idxmax()]
print (df)
         date            items  len
0  2022-03-04           [6, 7]    2
2  2022-03-05  [4, 8, 4, 4, 7]    5
4  2022-03-06     [4, 8, 5, 7]    4