Home > other >  Set python variable based on multiple conditions within a dataframe
Set python variable based on multiple conditions within a dataframe

Time:09-29

I want to create some variables (to be used in other calculations) based on a dataframe conditional on a number of columns including date. But for the date condition I want to use a datetime variable I have already created.

Dataframe looks like this:

Date         Product  Sales
28/09/2022   apple    10.2
28/09/2022   orange   5.1
28/09/2022   pear     12.3 
27/09/2022   apple    4.1
27/09/2022   orange   3.9
27/09/2022   pear     15.0

My date variable (which is 28/09/2022) date = datetime.today()

I'd like to create a variable called 'last_apple_sales' based on the dataframe where Date = my date variable and where Product = apple. So last_apple_sales = 10.2 in this case. And so on..

Had a go at using np.where with no luck.

Thank you.

CodePudding user response:

Another possible solution:

mydate = datetime.now().date()
df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y').dt.date
last_apple_sales = df.loc[(df['Date'] == mydate) & (
    df['Product'] == 'apple'), 'Sales'].squeeze()

However, if your goal is to do that repeatedly, the best is to create a function:

def last_sales(df, mydate, product):
    df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y').dt.date
    return df.loc[(df['Date'] == mydate) & (df['Product'] == product), 'Sales'].squeeze()

mydate = datetime.now().date()
last_sales(df, mydate, 'apple')

Complete code:

import pandas as pd
import numpy as np
from io import StringIO
from datetime import datetime

text = """
Date         Product  Sales
28/09/2022   apple    10.2
28/09/2022   orange   5.1
28/09/2022   pear     12.3 
27/09/2022   apple    4.1
27/09/2022   orange   3.9
27/09/2022   pear     15.0
"""

df = pd.read_csv(StringIO(text), sep='\s ')

def last_sales(df, mydate, product):
    df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y').dt.date
    return df.loc[(df['Date'] == mydate) & (df['Product'] == product), 'Sales'].squeeze()

mydate = datetime.now().date()
last_sales(df, mydate, 'apple')

CodePudding user response:

You can filter based on both conditions, select .max() and then .Sales

from datetime import datetime
df[(df.Date <= datetime(2022, 9, 28)) & (df.Product == 'apple')].max().Sales  # 10.2

Edit:

import pandas as pd
from datetime import datetime
# given date variable
date = datetime.today()
# create example dataframe
df = pd.DataFrame(
    {'Date': {0: '28/09/2022', 1: '28/09/2022', 2: '28/09/2022', 3: '27/09/2022', 4: '27/09/2022', 5: '27/09/2022'},
     'Product': {0: 'apple', 1: 'orange', 2: 'pear', 3: 'apple', 4: 'orange', 5: 'pear'},
     'Sales': {0: 10.2, 1: 5.1, 2: 12.3, 3: 4.1, 4: 3.9, 5: 15.0}}
    )
# cast date strings to datetime objects
df.Date = pd.to_datetime(df.Date, format='%d/%m/%Y')

# create last_apple_sales variable and assign...
last_apple_sales = df[(df.Date <= date) &  # filter dataframe to only include dates that are today or older (if you always use today to search, you can remove that line)
                      (df.Product == 'apple')  # filter dataframe to only include apple as a product
                      ].sort_values('Date', ascending=False).head(1).Sales  # sort values based on Date and select Sales for the latest date
  • Related