Home > database >  Divide rows from yearly to monthly values pandas
Divide rows from yearly to monthly values pandas

Time:10-14

I am trying to divide line items with a start and end date into multiple rows based on months. Values should be calculated based on number of days in the specific months.

For instance, data of 1 line item:

id StartDate EndDate Annual
abc 12/12/2018 01/12/2019 120,450

expected output:

id Month Year Monthly volume
abc 12 2018 6,600
abc 1 2019 10,230
abc 2 2019 9,240
abc 3 2019 10,230
abc 4 2019 9,900
abc 5 2019 10,230
abc 6 2019 9,900
abc 7 2019 10,230
abc 8 2019 10,230
abc 9 2019 9,900
abc 10 2019 10,230
abc 11 2019 9,900

CodePudding user response:

Few things for next time you ask.

  • This is a case where there are existing answers, so always try google first to reduce duplication. Other post is referenced in code below.
  • You should also always include the code you already tried, SO doesn't like to do your homework, but we will help you with it.
  • You should include a more readily reproduced dataframe. I shouldn't have to copy paste to build it, as in below code.
  • you are clearly doing something to convert the Annual total to a monthly volume but you do not explain this, so do not expect it to be done for you.
  • Lastly, this code doesn't convert to separate month and year columns, but once you have the date, this should be trivial for you to do (or to google how to do).
import pandas as pd

df = pd.DataFrame(
    data = [['abc','12/12/2018','12/01/2019',120450]], 
    columns = ['id', 'startDate', 'EndDate', 'Annual'] 
)
df['startDate'] = pd.to_datetime(df['startDate'])
df['EndDate'] = pd.to_datetime(df['EndDate'])
# pd.bdate_range(start="2020/12/16", end="2020/12/26", freq="C", weekmask="Sat Sun")
# %%
df_start_end = df.melt(id_vars=['id', 'Annual'],value_name='date')
# credit to u/gen
# https://stackoverflow.com/questions/42151886/expanding-pandas-data-frame-with-date-range-in-columns
df = (
    df_start_end.groupby('id')
    .apply(lambda x: x.set_index('date')
    .resample('M').pad())
    .drop(columns=['id','variable'])
    .reset_index()
)
print(df)
  • Related