Home > Mobile >  Pandas - Get max length of lists per day
Pandas - Get max length of lists per day

Time:04-08

Imagine you have the following data:

               date          items
0              2022-03-04    [6, 7]
1              2022-03-04    [4, 8]
1              2022-03-04    [4, 8]
1              2022-03-04    [4, 8]
1              2022-03-04    [4, 8]
...                   ...       ...
1082           2022-03-03         1
1082           2022-03-03         1
1082           2022-03-03         1
1083           2022-03-11         1
1083           2022-03-11         1

I'd like to reach a state where I have the max length of items per day

                date         len
0               2022-03-04     2
2               2022-03-05     3
...                    ...   ...
1083            2022-03-11     1

etc. I dont care for duplicates, I just want the top list sizes by day in the end.

I've tried the following:

df["len"] = max(df.items, key=len) 

but this seems to compute not per row, but rather on the column? as I get the following issue:

*** ValueError: Length of values (3) does not match length of index (4002)

Appreciate any assistance you can give

CodePudding user response:

Use Series.str.len with aggregate max, if values are not in lists get misisng values in df.items.str.len():

df = df.items.str.len().groupby(df['date']).max().reset_index(name='len')

If there is integers and need count like lenghts replace missing values by fillna:

df = (df.items.str.len().fillna(df['items'])
              .groupby(df['date'])
              .max()
              .reset_index(name='len'))

EDIT: If need row with maximal length of items:

print (df)
         date        items
0  2022-03-04        [6,7]
1  2022-03-04          [4]
1  2022-03-05  [4,8,4,4,7]
1  2022-03-05      [4,8,4]
1  2022-03-06    [4,8,5,7]


df = df.reset_index(drop=True)

df['len'] = df['items'].str.len().fillna(df['items'])
df = df.loc[df.groupby('date')['len'].idxmax()]
print (df)
         date            items  len
0  2022-03-04           [6, 7]    2
2  2022-03-05  [4, 8, 4, 4, 7]    5
4  2022-03-06     [4, 8, 5, 7]    4
  • Related