I want to create some variables (to be used in other calculations) based on a dataframe conditional on a number of columns including date. But for the date condition I want to use a datetime variable I have already created.
Dataframe looks like this:
Date Product Sales
28/09/2022 apple 10.2
28/09/2022 orange 5.1
28/09/2022 pear 12.3
27/09/2022 apple 4.1
27/09/2022 orange 3.9
27/09/2022 pear 15.0
My date variable (which is 28/09/2022)
date = datetime.today()
I'd like to create a variable called 'last_apple_sales' based on the dataframe where Date = my date variable and where Product = apple. So last_apple_sales = 10.2 in this case. And so on..
Had a go at using np.where with no luck.
Thank you.
CodePudding user response:
Another possible solution:
mydate = datetime.now().date()
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y').dt.date
last_apple_sales = df.loc[(df['Date'] == mydate) & (
df['Product'] == 'apple'), 'Sales'].squeeze()
However, if your goal is to do that repeatedly, the best is to create a function:
def last_sales(df, mydate, product):
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y').dt.date
return df.loc[(df['Date'] == mydate) & (df['Product'] == product), 'Sales'].squeeze()
mydate = datetime.now().date()
last_sales(df, mydate, 'apple')
Complete code:
import pandas as pd
import numpy as np
from io import StringIO
from datetime import datetime
text = """
Date Product Sales
28/09/2022 apple 10.2
28/09/2022 orange 5.1
28/09/2022 pear 12.3
27/09/2022 apple 4.1
27/09/2022 orange 3.9
27/09/2022 pear 15.0
"""
df = pd.read_csv(StringIO(text), sep='\s ')
def last_sales(df, mydate, product):
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y').dt.date
return df.loc[(df['Date'] == mydate) & (df['Product'] == product), 'Sales'].squeeze()
mydate = datetime.now().date()
last_sales(df, mydate, 'apple')
CodePudding user response:
You can filter based on both conditions, select .max() and then .Sales
from datetime import datetime
df[(df.Date <= datetime(2022, 9, 28)) & (df.Product == 'apple')].max().Sales # 10.2
Edit:
import pandas as pd
from datetime import datetime
# given date variable
date = datetime.today()
# create example dataframe
df = pd.DataFrame(
{'Date': {0: '28/09/2022', 1: '28/09/2022', 2: '28/09/2022', 3: '27/09/2022', 4: '27/09/2022', 5: '27/09/2022'},
'Product': {0: 'apple', 1: 'orange', 2: 'pear', 3: 'apple', 4: 'orange', 5: 'pear'},
'Sales': {0: 10.2, 1: 5.1, 2: 12.3, 3: 4.1, 4: 3.9, 5: 15.0}}
)
# cast date strings to datetime objects
df.Date = pd.to_datetime(df.Date, format='%d/%m/%Y')
# create last_apple_sales variable and assign...
last_apple_sales = df[(df.Date <= date) & # filter dataframe to only include dates that are today or older (if you always use today to search, you can remove that line)
(df.Product == 'apple') # filter dataframe to only include apple as a product
].sort_values('Date', ascending=False).head(1).Sales # sort values based on Date and select Sales for the latest date