Home > Net >  Dataframe check next months data and return specific column
Dataframe check next months data and return specific column

Time:09-08

I want to return the Bill_Status for the next month. This information should be validated against two fields: Cus_Number and Cus_Account

Here is my DataFrame:

import pandas as pd  
  
# assign data of lists.  
data = {'Date': ['2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30', '2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30', '2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30'],
        'Cus_Number': ['1487', '1487', '1487', '1487', '1278', '1278', '1278', '1278', '1278', '1278', '1278', '1278'],
        'Cus_Account': ['101', '101', '101', '101', '120', '120', '120', '120', '122', '122', '122', '122'],
        'Bill_Status': [0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0]}  
  
# Create DataFrame  
df = pd.DataFrame(data)  
  
# Print the output.  
print(df)  

For my current attempt I used the following code:

df=df.reset_index()
df['Date']=pd.to_datetime(df['Date'],infer_datetime_format=True)
df['Next_Month']=df['Date'] pd.DateOffset(months=1)
res=df.merge(df,right_on='Date',left_on='Next_Month',how='left')

With the following output: enter image description here

This is only a small sample however in the main data I have thousands of Cus_Number and Cus_Account where both of them should be the same and return the Bill_Status for the upcoming month. I am struggling to find a solution to validate both Cus_Number and Cus_Account and return the Bill_Status for the next month. any assistance will be appreciated.

Expected Output:

enter image description here

CodePudding user response:

I looked at your code and unless I miss the intent I do not believe you would want to shift your Bill_Status_Next_Month based on a different customers Bill_Status. I addressed this by using a groupby() and shift() so your Bill_Status_Next_Month will only shift for the specified customer/account grouping. If this is not what your intent is please let me know and I can update the code. I also went ahead and filled what would be N/A with 0

data = {'Date': ['2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30', '2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30'],
        'Cus_Number': ['1487', '1487', '1487', '1487', '1278', '1278', '1278', '1278'],
        'Cus_Account': ['101', '101', '101', '101', '120', '120', '120', '120'],
        'Bill_Status': [0, 0, 0, 1, 0, 1, 0, 0]}  
  
# Create DataFrame  
df = pd.DataFrame(data)  
  
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
df['Bill_Status_Next_Month'] = df.sort_values(['Date', 'Cus_Number', 'Cus_Account']).groupby(['Cus_Number', 'Cus_Account'])['Bill_Status'].shift(-1).fillna(0)
df

CodePudding user response:

Assuming your Date column is in chronological order, so that each next row (per Cus_Number & Cus_Account combination) contains the next month, try this:

df['Bill_Status_Next_Month'] = df.groupby(['Cus_Number','Cus_Account'])\
    ['Bill_Status'].shift(-1)

print(df)

         Date Cus_Number Cus_Account  Bill_Status  Bill_Status_Next_Month
0  2022-01-31       1487         101            0                     0.0
1  2022-02-28       1487         101            0                     0.0
2  2022-03-31       1487         101            0                     1.0
3  2022-04-30       1487         101            1                     NaN
4  2022-01-31       1278         120            0                     1.0
5  2022-02-28       1278         120            1                     0.0
6  2022-03-31       1278         120            0                     0.0
7  2022-04-30       1278         120            0                     NaN

Chain .fillna(0) if you want zeros also for the last month for each group (though, the next month's bill of course does not yet exist for these).

  • Related