I want to return the Bill_Status
for the next month. This information should be validated against two fields: Cus_Number
and Cus_Account
Here is my DataFrame:
import pandas as pd
# assign data of lists.
data = {'Date': ['2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30', '2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30', '2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30'],
'Cus_Number': ['1487', '1487', '1487', '1487', '1278', '1278', '1278', '1278', '1278', '1278', '1278', '1278'],
'Cus_Account': ['101', '101', '101', '101', '120', '120', '120', '120', '122', '122', '122', '122'],
'Bill_Status': [0, 0, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)
For my current attempt I used the following code:
df=df.reset_index()
df['Date']=pd.to_datetime(df['Date'],infer_datetime_format=True)
df['Next_Month']=df['Date'] pd.DateOffset(months=1)
res=df.merge(df,right_on='Date',left_on='Next_Month',how='left')
This is only a small sample however in the main data I have thousands of Cus_Number
and Cus_Account
where both of them should be the same and return the Bill_Status
for the upcoming month. I am struggling to find a solution to validate both Cus_Number
and Cus_Account
and return the Bill_Status
for the next month. any assistance will be appreciated.
Expected Output:
CodePudding user response:
I looked at your code and unless I miss the intent I do not believe you would want to shift your Bill_Status_Next_Month based on a different customers Bill_Status. I addressed this by using a groupby()
and shift()
so your Bill_Status_Next_Month will only shift for the specified customer/account grouping. If this is not what your intent is please let me know and I can update the code. I also went ahead and filled what would be N/A with 0
data = {'Date': ['2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30', '2022-01-31', '2022-02-28', '2022-03-31', '2022-04-30'],
'Cus_Number': ['1487', '1487', '1487', '1487', '1278', '1278', '1278', '1278'],
'Cus_Account': ['101', '101', '101', '101', '120', '120', '120', '120'],
'Bill_Status': [0, 0, 0, 1, 0, 1, 0, 0]}
# Create DataFrame
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
df['Bill_Status_Next_Month'] = df.sort_values(['Date', 'Cus_Number', 'Cus_Account']).groupby(['Cus_Number', 'Cus_Account'])['Bill_Status'].shift(-1).fillna(0)
df
CodePudding user response:
Assuming your Date
column is in chronological order, so that each next row (per Cus_Number
& Cus_Account
combination) contains the next month, try this:
df['Bill_Status_Next_Month'] = df.groupby(['Cus_Number','Cus_Account'])\
['Bill_Status'].shift(-1)
print(df)
Date Cus_Number Cus_Account Bill_Status Bill_Status_Next_Month
0 2022-01-31 1487 101 0 0.0
1 2022-02-28 1487 101 0 0.0
2 2022-03-31 1487 101 0 1.0
3 2022-04-30 1487 101 1 NaN
4 2022-01-31 1278 120 0 1.0
5 2022-02-28 1278 120 1 0.0
6 2022-03-31 1278 120 0 0.0
7 2022-04-30 1278 120 0 NaN
Chain .fillna(0)
if you want zeros also for the last month for each group (though, the next month's bill of course does not yet exist for these).