i'm reading from an excel file
GA = pd.read_excel("file.xlsx", sheet_name=0, engine= "openpyxl")
The data type is:
- Email object
- Date datetime64[ns]
- Name object
I want to get only the row with the first date of an email
For example:
- [email protected] 1/1/2022 a
- [email protected] 2/1/2022 b
- [email protected] 3/1/2022 c
I'm trying to get only
- [email protected] 1/1/2022 a
- [email protected] 3/1/2022 c
I tried GA.groupby('email')['date'].min()
But I'm getting the TypeError: '<' not supported between instances of 'datetime.datetime' and 'int'
i tried to change the date type to an object, tried to add reset_index(), tried to use agg('min) instead of min(), tried GA.sort_values('date').groupby('email').tail(1)
but keep getting this error, please help
CodePudding user response:
I believe your solution was only missing df['date'] = pd.to_datetime(df['date'])
for it to work, so:
import pandas as pd
import numpy as np
data = {'email': ['[email protected]', '[email protected]', '[email protected]'],
'date': ['01/01/2022', '02/01/2022', '03/01/2022'],
}
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
df.groupby('email')['date'].min()
Output is:
email
[email protected] 2022-01-01
[email protected] 2022-03-01
Name: date, dtype: datetime64[ns]
CodePudding user response:
The problem was, that the email had integer, not the date thank you for your time