Home > Blockchain >  How do I get the maximum value of a column in pandas, for a given value in another column? e.g what
How do I get the maximum value of a column in pandas, for a given value in another column? e.g what

Time:02-16

I have a csv file with one date column and a category column, and I want to create a second table showing only the maximum value for each category for each day. I can get the value in a single line, but it is the max for the entire dataset, not per category per day and I need a new pandas dataframe to store all the results.

example below:

import pandas as pd

dict1 = {'Category': ['A', 'A', 'A',
                    'B', 'B', 'B',
                    'B',
                    'A', 'A', 'A',
                    'B', 'B', 'B',
                    'B',],

         'Date': ['2018-01-02', '2018-01-02', '2018-01-02', '2018-01-02', '2018-01-02', '2018-01-02', '2018-01-02', '2018-01-03', '2018-01-03',
                  '2018-01-03', '2018-01-03', '2018-01-03', '2018-01-03', '2018-01-03'],

         'Ending Time': ['2018-01-02 20:51:54', '2018-01-02 20:58:54' , '2018-01-02 21:01:02', '2018-01-02 22:01:02', '2018-01-02 21:01:02', '2018-01-02 22:01:02', '2018-01-02 23:01:02',
                         '2018-01-03 12:01:02','2018-01-03 13:01:02','2018-01-03 15:22:02','2018-01-03 16:23:02',
                         '2018-01-03 17:01:02','2018-01-03 18:01:02','2018-01-03 19:01:02']}


df = pd.DataFrame(dict1)

df['Date'] = pd.to_datetime(df['Date'], format='%Y %m %d')
df['Ending Time'] = pd.to_datetime(df['Ending Time'], format='%Y-%m-%d %H:%M:%S')

print(df.head())
print(df[df['Ending Time'] == df['Ending Time'].max()]) ```

CodePudding user response:

IIUC, you need to group by category and day then extract the maximum of ending time:

out = df.groupby(['Category', 'Date'])['Ending Time'].max().reset_index()
print(out)

# Output
  Category       Date         Ending Time
0        A 2018-01-02 2018-01-02 21:01:02
1        A 2018-01-03 2018-01-03 15:22:02
2        B 2018-01-02 2018-01-02 23:01:02
3        B 2018-01-03 2018-01-03 19:01:02

You can also broadcast the maximum ending time to your actual dataframe:

df['Max'] = df.groupby(['Category', 'Date'])['Ending Time'].transform('max')
print(df)

# Output
   Category       Date         Ending Time                 Max
0         A 2018-01-02 2018-01-02 20:51:54 2018-01-02 21:01:02
1         A 2018-01-02 2018-01-02 20:58:54 2018-01-02 21:01:02
2         A 2018-01-02 2018-01-02 21:01:02 2018-01-02 21:01:02
3         B 2018-01-02 2018-01-02 22:01:02 2018-01-02 23:01:02
4         B 2018-01-02 2018-01-02 21:01:02 2018-01-02 23:01:02
5         B 2018-01-02 2018-01-02 22:01:02 2018-01-02 23:01:02
6         B 2018-01-02 2018-01-02 23:01:02 2018-01-02 23:01:02
7         A 2018-01-03 2018-01-03 12:01:02 2018-01-03 15:22:02
8         A 2018-01-03 2018-01-03 13:01:02 2018-01-03 15:22:02
9         A 2018-01-03 2018-01-03 15:22:02 2018-01-03 15:22:02
10        B 2018-01-03 2018-01-03 16:23:02 2018-01-03 19:01:02
11        B 2018-01-03 2018-01-03 17:01:02 2018-01-03 19:01:02
12        B 2018-01-03 2018-01-03 18:01:02 2018-01-03 19:01:02
13        B 2018-01-03 2018-01-03 19:01:02 2018-01-03 19:01:02
  • Related