I seek to calculate a decade's worth of 12th-grade enrollment projections based on a record that acts as a starting point.
I have a dataframe of several school districts and total enrollment for grades 11 and 12 as of 2020-21. Here's an example of one record:
df = pd.DataFrame({"year": ['2020_21'],
"district_name" : ["School District A"] ,
"grade11" : [5000],
"grade12": [5200],
"grade11_change": [1.01],
"grade11_grade12_ratio": [0.9]})
df
I want to generate 10 years' worth of grade11 and grade12 estimates. Each year's grade11 value is based on a predicted change from the previous year's grade11 value. Each year's grade12 value is based on a predicted ratio of the previous year's grade11 value. So, in the example record, the grade12 value in 2021-222 would be 90% of the grade11 value in 2020-21.
I've gone through other posts and tried to craft a for loop that would do the desired calculations. But my loop overwrites previous years with the last year, and I get NaN's for the grade11 and grade12 values.
projection_years=['2021_22','2022_23','2023_24','2024_25','2025_26','2026_27','2027_28','2028_29','2029_30','2030_31']
change_11=df.iloc[0]['grade11_change']
ratio_11_12=df.iloc[0]['grade11_grade12_ratio']
district_data=[]
for school_year in projection_years:
print(school_year)
df['year']=school_year
df.loc[:,'grade11']=df['grade11'].shift(1)*change_11
df.loc[:,'grade12']=df['grade11'].shift(1)*ratio_11_12
district_data.append(df)
all_years_df=pd.concat(district_data)
all_years_df_final=all_years_df[['year','district_name','grade11','grade12']]
print ('Done with ' school_year)
print('')
print('all done')
Below is my desired result. The desired dataframe's first record would be the 2020-21 data and end with 2030_31.
result = pd.DataFrame({"year": ['2020_21','2021_22','2022_23','2023_24','2024_25','2025_26','2026_27','2027_28','2028_29','2029_30','2030_31'],
"district_name":["School District A","School District A","School District A","School District A","School District A","School District A","School District A","School District A","School District A","School District A","School District A"],
"grade11":[5000,5050,5100,5151,5203,5255,5307,5360,5414,5468,5523],
"grade12":[5200,4500,4545,4590,4636,4683,4730,4777,4825,4873,4922]})
result
Thank you for any assistance.
CodePudding user response:
Something like that should work?
projection_years=['2021_22','2022_23','2023_24','2024_25','2025_26','2026_27','2027_28','2028_29','2029_30','2030_31']
grade11_change = 1.01
grade11_grade12_ratio = 0.9
for year in projection_years:
lr = df.iloc[-1]
row = {}
row['year'] = year
row['district_name'] = 'School District A'
row['grade11'] = int(lr['grade11'] * grade11_change)
row['grade12'] = int(lr['grade11'] * grade11_grade12_ratio)
df = df.append([row])
>>> df[['year','district_name','grade11','grade12']]
year district_name grade11 grade12
0 2020_21 School District A 5000 5200
0 2021_22 School District A 5050 4500
0 2022_23 School District A 5100 4545
0 2023_24 School District A 5151 4590
0 2024_25 School District A 5203 4636
0 2025_26 School District A 5255 4683
0 2026_27 School District A 5308 4730
0 2027_28 School District A 5361 4777
0 2028_29 School District A 5415 4825
0 2029_30 School District A 5469 4874
0 2030_31 School District A 5524 4922