Home > Back-end >  Python Pandas '>=' not supported between instances of 'str' and 'float
Python Pandas '>=' not supported between instances of 'str' and 'float

Time:07-13

Issue 1 - solved by using pd.to_datetime(df.Date, format='%Y-%m-%d'). Thanks to Michael

I am trying to find the latest date of each user using their ID

df['Latest Date'] = df.groupby(['ID'])['Date'].transform.('max')
df.drop_duplicates(subset='ID', keep='last',inplace=True)

But I am getting '>=' not supported between instances of 'str' and 'float'

I have used the same approach in the past and it worked fine.

When I did dytypes, I see 'ID' column is int64 and Date column as object because I converted the date column to df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%Y-%m-%d')


Issue 2 solved - See Michael's comment 'For the edit'

But the output does not look right

I am trying to find the latest date of each user using their ID and assign those dates to new columns using the category

Dataframe = df

Data looks like below,

 ID   CATEGORY NAME      DATE
 1        fruits      2017-08-07 00:00:00   
 2        veggies     2018-01-25 00:00:00   
 1        fruits      2015-08-07 00:00:00
 2        veggies     2022-01-01 00:00:00

My code is below

//Converting the date format 
    df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')

//transforming to identify the latest date
    df['Latest Date'] = df.groupby(['ID'])['Date'].transform.('max')

//keeping the last and dropping the duplicates
    df.drop_duplicates(subset='ID', keep='last',inplace=True)

//inserting new columns
    df['Fruits'] = ' '
    df['Veggies'] = ' '

//applying the latest dates to the newly created columns
    df.loc[((df['CATEGORY NAME'] == 'fruits')), 'Fruits'] = df['Latest Date']
    df.loc[((df['CATEGORY NAME'] == 'veggies')), 'Veggies'] = df['Latest Date']

I want the output like below

ID   CATEGORY NAME      DATE          Latest Date      Fruits      Veggies
 1        fruits      2017-08-07       2017-08-07    2017-08-07

 2        veggies     2022-01-01       2022-01-01                 2022-01-01

But my output looks odd. I don't have an error message but the output is not right

ID   CATEGORY NAME      DATE          Latest Date      Fruits      Veggies
 1        fruits      2017-08-07                    
 
 2        veggies     2022-01-01       2021-01-01                 2021-01-01 00:00:00

If you notice above

  1. It did not identify the latest correctly
  2. When applying the date values to the new column, its 00:00:00 time format also shows up
  3. It did not drop duplicates

Not sure what's wrong

CodePudding user response:

strftime converts a date to string. Did you want to keep it as a datetime object but change the format? Try this instead:

df.Date = pd.to_datetime(df.Date, format='%Y-%m-%d')

For the Edit

I'm not sure why you want the "Date" and "Latest Date" columns to be the same, but here is the code that will give you your desired table output:

# Recreate dataframe
ID = [1,2,1,2]
CATEGORY_NAME =   ["fruits", "veggies", "fruits", "veggies"]      
DATE = ["2017-08-07 00:00:00", "2018-01-25 00:00:00", "2015-08-07 00:00:00", "2022-01-01 00:00:00"]
df = pd.DataFrame({"ID":ID,"CATEGORY NAME":CATEGORY_NAME, "Date":DATE})

# Convert datetime format
df['Date'] = pd.to_datetime(df.Date, format='%Y-%m-%d')

# Get the max date value and assign the group to a new dataframe
dfNew = df.groupby(['ID'], as_index=False).max()

# The new dataframes Date and Latest Date column are the same
dfNew['Latest Date'] = dfNew['Date']

# Fix latest Date formatting 
dfNew["Latest Date"] = dfNew["Latest Date"].dt.date

# Add fruit and veggie columns
dfNew['Fruits'] = ' '
dfNew['Veggies'] = ' '

# Place in the desired values
dfNew.loc[((dfNew['CATEGORY NAME'] == 'fruits')), 'Fruits'] = dfNew['Latest Date']
dfNew.loc[((dfNew['CATEGORY NAME'] == 'veggies')), 'Veggies'] = dfNew['Latest Date']
dfNew

Output:

    ID  CATEGORY NAME   Date       Latest Date     Fruits         Veggies
0   1   fruits          2017-08-07  2017-08-07     2017-08-07   
1   2   veggies         2022-01-01  2022-01-01                    2022-01-01

CodePudding user response:

Are you sure 'Date' is not mixed? It looks to me as if it still contains 'str' and 'float', and maybe 'datetime'. You sure it's clean, and all records converted?

  • Related