Home > other >  How to correctly update pandas dataframe from a filtered dataframe that has datetime column
How to correctly update pandas dataframe from a filtered dataframe that has datetime column

Time:06-14

I'm currently working on a dataframe named df that requires first to filter it to a new dataframe named df1 based on date and num_posts column, next step is set the value of num_posts column to 10 and increment date column by one month of the filtered df1.

the logic to filter df to df1 is:

if date == today & num_posts == 4.

The last step once i update the selected columns for df1 mentioned above, is to update df from df1 with this line of code df.update(df1)

What is working. I'm able to filter df to df1 based on the logic and update date and num_posts columns within the filtered df1.

What is not working When i try to update df from df1, it only updates the date column, but cannot update num_posts values.

My code:

import pandas as pd
import datetime

df = pd.DataFrame({'num_posts': [5, 4, 4, 4, 1, 14],
                   'date': ['2022-06-10', '2022-06-14', 
                            '2022-06-14', '2020-09-12', 
                            '2020-09-29', '2020-10-15'],
                  'user': ['user4', 'user1', 'user1', 'user3', 'user4', 'user4']})

df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')

# Logic one start
# Get current date
new_date = datetime.datetime.now()
current_date = new_date.strftime("%Y-%m-%d")

# filter posts that equal 4 and date equal today
df1 = df.loc[(df['num_posts'] == 4) & (df['date'] == current_date)].copy()


# # overwrite the num_posts column with 10
df1.loc[df1['num_posts'] == 4, 'num_posts'] = 10
# df1.replace({'num_posts': {4: 10}}, inplace=True)

# Increment date by one month
plus_month_period = 1
df1 = df1['date']   pd.DateOffset(months=plus_month_period)
df1
# updating the old dataframe
df.update(df1)
df

When i run my code i get the following output which is not the intended output.

    num_posts   date    user
0   5   2022-06-10  user4
1   4   2022-07-14  user1
2   4   2022-07-14  user1
3   4   2020-09-12  user3
4   1   2020-09-29  user4
5   14  2020-10-15  user4

What i Expect as output once i run my code above(have modified it manually).

    num_posts   date    user
0   5   2022-06-10  user4
1   10  2022-07-14  user1
2   10  2022-07-14  user1
3   4   2020-09-12  user3
4   1   2020-09-29  user4
5   14  2020-10-15  user4

What am i doing wrong?

CodePudding user response:

Assign column date:

df1['date'] = df1['date']   pd.DateOffset(months=plus_month_period)

Or:

df1['date']  = pd.DateOffset(months=plus_month_period)

df.update(df1)
print (df)
   num_posts       date   user
0        5.0 2022-06-10  user4
1       10.0 2022-07-14  user1
2       10.0 2022-07-14  user1
3        4.0 2020-09-12  user3
4        1.0 2020-09-29  user4
5       14.0 2020-10-15  user4

Alternative solution is assign to original DataFrame by mask and DataFrame.loc:

new_date = datetime.datetime.now()
current_date = new_date.strftime("%Y-%m-%d")
plus_month_period = 1

m = (df['num_posts'] == 4) & (df['date'] == current_date)
df.loc[m, 'num_posts'] = 10
df.loc[m, 'date']  = pd.DateOffset(months=plus_month_period)
print (df)
   num_posts       date   user
0          5 2022-06-10  user4
1         10 2022-07-14  user1
2         10 2022-07-14  user1
3          4 2020-09-12  user3
4          1 2020-09-29  user4
5         14 2020-10-15  user4
  • Related