Home > Mobile >  For loop cumulating and selecting data per region and year
For loop cumulating and selecting data per region and year

Time:10-15

I have data, from 16 different regions. I would like to find the day that I see 5th and 95th % of the area being green (NDVI).

So far, I have done it manually, but I would like to do it in a for loop for every region, every year. First, I extract the different regions and years. Second, I find the points being at the 5% and 95% greening. Third, I extract the minimum day found at 5 % greening and at 95 % greening. Fourth, I collect all of these into one dataframe per ecoregion, containing all the years and the difference between day at 95 % greening and data at 5 % greening.

This is done for each region (16) and each year (19). So it is a lot of manual labor and heavy for the computer with a long script, see part of it below:

x = pd.read_csv('D:/data.csv')
x = x[x['means'] > 0]
x = x[x['diff'] > -1.5]
x = x[x['area'] > 9318]
x = x.sort_values(by = 'doy')

AKB = x[x['name'] == 'Region1'].drop_duplicates(subset=['ID', 'Year'], keep = 'first')
AKB['cummulative_area'] = AKB.groupby(['Year'])['area'].cumsum()
AKT = x[x['name'] == 'Region2'].drop_duplicates(subset=['ID', 'Year'], keep = 'first')
AKT['cummulative_area'] = AKT.groupby(['Year'])['area'].cumsum()

#Find 5% and 95 % of the burned area, the respective days and subtract them to see development in fire season

AKB01 = AKB[AKB['Year'] == 2001]
AKB01fifth = AKB01[AKB01['cumulative_area'] > AKB01['area'].sum() * 0.05]
AKB01ninefifth = AKB01[AKB01['cumulative_area'] > AKB01['area'].sum() * 0.95]
AKB01 = AKB01ninefifth.doy.min() - AKB01fifth.doy.min()
AKB02 = AKB[AKB['Year'] == 2002]
AKB02fifth = AKB02[AKB02['cumulative_area'] > AKB02['area'].sum() * 0.05]
AKB02ninefifth = AKB02[AKB02['cumulative_area'] > AKB02['area'].sum() * 0.95]
AKB02 = AKB02ninefifth.doy.min() - AKB02fifth.doy.min()
AKB03 = AKB[AKB['Year'] == 2003]
AKB03fifth = AKB03[AKB03['cumulative_area'] > AKB03['area'].sum() * 0.05]
AKB03ninefifth = AKB03[AKB03['cumulative_area'] > AKB03['area'].sum() * 0.95]
AKB03 = AKB03ninefifth.doy.min() - AKB03fifth.doy.min()
AKB04 = AKB[AKB['Year'] == 2004]
AKB04fifth = AKB04[AKB04['cumulative_area'] > AKB04['area'].sum() * 0.05]
AKB04ninefifth = AKB04[AKB04['cumulative_area'] > AKB04['area'].sum() * 0.95]
AKB04 = AKB04ninefifth.doy.min() - AKB04fifth.doy.min()
...
AKB18 = AKB[AKB['Year'] == 2018]
AKB18fifth = AKB18[AKB18['cumulative_area'] > AKB18['area'].sum() * 0.05]
AKB18ninefifth = AKB18[AKB18['cumulative_area'] > AKB18['area'].sum() * 0.95]
AKB18 = AKB18ninefifth.doy.min() - AKB18fifth.doy.min()
AKB19 = AKB[AKB['Year'] == 2019]
AKB19fifth = AKB19[AKB19['cumulative_area'] > AKB19['area'].sum() * 0.05]
AKB19ninefifth = AKB19[AKB19['cumulative_area'] > AKB19['area'].sum() * 0.95]
AKB19 = AKB19ninefifth.doy.min() - AKB19fifth.doy.min()
AKT01 = AKB[AKB['Year'] == 2001]
AKT01fifth = AKB01[AKB01['cumulative_area'] > AKB01['area'].sum() * 0.05]
AKT01ninefifth = AKB01[AKB01['cumulative_area'] > AKB01['area'].sum() * 0.95]
AKT01 = AKT01ninefifth.doy.min() - AKT01fifth.doy.min()
AKT02 = AKT[AKT['Year'] == 2002]
AKT02fifth = AKT02[AKT02['cumulative_area'] > AKT02['area'].sum() * 0.05]
AKT02ninefifth = AKT02[AKT02['cumulative_area'] > AKT02['area'].sum() * 0.95]
AKT02 = AKT02ninefifth.doy.min() - AKT02fifth.doy.min()


...

AKBign = pd.DataFrame()

AKBign['year'] = np.arange(2001,2020,1)

AKBign['difference'] = [AKB01,AKB02,AKB03,AKB04,AKB05,AKB06,AKB07,AKB08,AKB09,AKB10,AKB11,AKB12,AKB13,AKB14,AKB15,AKB16,AKB17,AKB18,AKB19]

I would like to make this into a for loop that does my abovementioned steps for each region, each year, and collect it into one large dataframe. How do I compute that in Python?

CodePudding user response:

I think you want something like this, which will give you a dictionary whose keys are regions and values are dictionaries with keys of the year and values of the difference:

from collections import defaultdict

regions = ['Region1', 'Region2'] # expand as required
years = range(2001,2020)

result = defaultdict(dict)
for region in regions:
    xr = x[x['name'] == region].drop_duplicates(subset=['ID', 'Year'], keep = 'first')
    xr['cumulative_area'] = xr.groupby(['Year'])['area'].cumsum()
    for year in years:
        xry = xr[xr['Year'] == year]
        xryfifth = xry[xry['cumulative_area'] > xry['area'].sum() * 0.05]
        xryninefifth = xry[xry['cumulative_area'] > xry['area'].sum() * 0.95]
        result[region][year] = xryninefifth.doy.min() - xryfifth.doy.min()
  • Related