Home > Software engineering >  Trying to get the minimum date and getting TypeError: '<' not supported between instanc
Trying to get the minimum date and getting TypeError: '<' not supported between instanc

Time:05-15

i'm reading from an excel file

GA = pd.read_excel("file.xlsx", sheet_name=0, engine= "openpyxl")

The data type is:

  • Email object
  • Date datetime64[ns]
  • Name object

I want to get only the row with the first date of an email

For example:

I'm trying to get only

I tried GA.groupby('email')['date'].min()

But I'm getting the TypeError: '<' not supported between instances of 'datetime.datetime' and 'int'

i tried to change the date type to an object, tried to add reset_index(), tried to use agg('min) instead of min(), tried GA.sort_values('date').groupby('email').tail(1) but keep getting this error, please help

CodePudding user response:

I believe your solution was only missing df['date'] = pd.to_datetime(df['date']) for it to work, so:

import pandas as pd
import numpy as np
data = {'email':  ['[email protected]', '[email protected]', '[email protected]'],
        'date': ['01/01/2022', '02/01/2022', '03/01/2022'],
        }
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
df.groupby('email')['date'].min()

Output is:

email
[email protected]   2022-01-01
[email protected]   2022-03-01
Name: date, dtype: datetime64[ns]

CodePudding user response:

The problem was, that the email had integer, not the date thank you for your time

  • Related