Home > OS >  python select the dataframe column with the oldest date for each user
python select the dataframe column with the oldest date for each user

Time:11-30

This question seems super easy but I just can't figure it out.

I have the following dataframe:

watched_df = pd.DataFrame([['Love Death Robots', '2016-01-29 14:04:22', 'Anna'], 
                   ['James Bond', '2016-02-29 14:04:22', 'Anna'],
                   ['Lord of the Rings', '2016-09-29 14:04:22', 'BoB'], 
                   ['The Hobbit', '2016-10-29 14:04:22', 'Anna'],
                   ['Beauty and the Beast', '2016-06-29 14:04:22', 'BoB']], 
                  columns=['Title', 'Date', 'Profile_Name'])
watched_df.head()

For each user I want to know what is the 'Title' of the first movie the 'Profile_Name' has watched and on what 'date' it was.

So the wanted result is:

wanted_df = pd.DataFrame([['Love Death Robots', '2016-01-29 14:04:22', 'Anna'], 
                   ['Beauty and the Beast', '2016-06-29 14:04:22', 'BoB']], 
                  columns=['Title', 'Date', 'Profile_Name'])
wanted_df.head()

I managed to get a dataframe that is indexed by the 'Profile_Name' and the respective oldest dates. But I cannot manage to add the 'Title' to the dataframe

CodePudding user response:

First you need to change the data type of "Date" to datetime:

watched_df['Date'] = pd.to_datetime(watched_df['Date'])

Then you just need to sort by date first and remove duplicates by keeping it the first row:

watched_df.sort_values('Date').drop_duplicates(['Profile_Name'], keep='first')

This should give the result you are looking for.

  • Related