Home > database >  For loop calculate values and append to initial record in pandas
For loop calculate values and append to initial record in pandas

Time:06-04

I seek to calculate a decade's worth of 12th-grade enrollment projections based on a record that acts as a starting point.

I have a dataframe of several school districts and total enrollment for grades 11 and 12 as of 2020-21. Here's an example of one record:

df = pd.DataFrame({"year": ['2020_21'],
    "district_name" : ["School District A"] , 
    "grade11" : [5000],
    "grade12": [5200],
    "grade11_change": [1.01],
    "grade11_grade12_ratio": [0.9]})
df

I want to generate 10 years' worth of grade11 and grade12 estimates. Each year's grade11 value is based on a predicted change from the previous year's grade11 value. Each year's grade12 value is based on a predicted ratio of the previous year's grade11 value. So, in the example record, the grade12 value in 2021-222 would be 90% of the grade11 value in 2020-21.

I've gone through other posts and tried to craft a for loop that would do the desired calculations. But my loop overwrites previous years with the last year, and I get NaN's for the grade11 and grade12 values.

projection_years=['2021_22','2022_23','2023_24','2024_25','2025_26','2026_27','2027_28','2028_29','2029_30','2030_31']
change_11=df.iloc[0]['grade11_change']
ratio_11_12=df.iloc[0]['grade11_grade12_ratio']
district_data=[]
for school_year in projection_years:
    print(school_year)
    df['year']=school_year
    df.loc[:,'grade11']=df['grade11'].shift(1)*change_11
    df.loc[:,'grade12']=df['grade11'].shift(1)*ratio_11_12
    district_data.append(df)
    all_years_df=pd.concat(district_data)
    all_years_df_final=all_years_df[['year','district_name','grade11','grade12']]
    print ('Done with '   school_year)
print('')
print('all done')

Below is my desired result. The desired dataframe's first record would be the 2020-21 data and end with 2030_31.

result = pd.DataFrame({"year": ['2020_21','2021_22','2022_23','2023_24','2024_25','2025_26','2026_27','2027_28','2028_29','2029_30','2030_31'],
                      "district_name":["School District A","School District A","School District A","School District A","School District A","School District A","School District A","School District A","School District A","School District A","School District A"],
                      "grade11":[5000,5050,5100,5151,5203,5255,5307,5360,5414,5468,5523],
                       "grade12":[5200,4500,4545,4590,4636,4683,4730,4777,4825,4873,4922]})
result

Thank you for any assistance.

CodePudding user response:

Something like that should work?

projection_years=['2021_22','2022_23','2023_24','2024_25','2025_26','2026_27','2027_28','2028_29','2029_30','2030_31']
grade11_change = 1.01
grade11_grade12_ratio = 0.9

for year in projection_years:
    lr = df.iloc[-1]
    row = {}
    row['year'] = year
    row['district_name'] = 'School District A'
    row['grade11'] = int(lr['grade11'] * grade11_change)
    row['grade12'] = int(lr['grade11'] * grade11_grade12_ratio)
    
    df = df.append([row])

>>> df[['year','district_name','grade11','grade12']]

    year    district_name   grade11 grade12
0   2020_21 School District A   5000    5200
0   2021_22 School District A   5050    4500
0   2022_23 School District A   5100    4545
0   2023_24 School District A   5151    4590
0   2024_25 School District A   5203    4636
0   2025_26 School District A   5255    4683
0   2026_27 School District A   5308    4730
0   2027_28 School District A   5361    4777
0   2028_29 School District A   5415    4825
0   2029_30 School District A   5469    4874
0   2030_31 School District A   5524    4922
  • Related