How to apply for loop to append rows based on multiple calculation?-CodePudding

I have a very complex situation to append the rows with one agg. function of sum of population based on different columns. Find below:

Please make sure I have multiple rows in all column such as "year" in range (2019,2040) & "con" have mulitple countries.

import pandas as pd
d = { 'year': [2019,2020,2021,2020,2019,2021], 'age': [10,20,30,10,20,30], 'con': ['UK','UK','UK','US','US','US'],'population': [1,2,300,400,1000,2000]}
df = pd.DataFrame(data=d)
df

 year   age con population
 2019   10  UK      1
 2020   20  UK      2
 2021   30  UK      300
 2020   10  US      400
 2019   20  US      1000
 2021   30  US      2000

output required:

year    age   con     population
2019    10     UK          1
2020    20     UK          2
2021    30     UK          300
2020    10     US          400
2019    20     US          1000
2021    30     US          2000
2019    0-10   UK child    1     #addition of 1
2020    10-20  UK teen     3     #addition of 1 2
2021    20-30  UK working  302   #addition of 2 300

I am looking for a loop function so I apply on con col

I am trying, FAILED!!!

variable_list = ['UK', 'US']
ranges = [[0,10], [10,20], [20,30]]


categories = ["Child", "teen", "work"]

year = [x for x in range(2019,2022)]

q = df#df.loc[(df["Kategorie 1"].str.strip()==BASE)]
q["age2"] = pd.to_numeric(q["age"])

sums_years = {}


                   
for variable in variable_list:
  c = 0
  u = q.loc[q["cat2"]==variable]  
  for r in ranges:
    cat = "Germany: "   categories[c]
    for year in date:
      group = str(r[0]) '-' str(r[1])
      n = variable   "_"   group
      if n not in sums_years:
        sums_years[n] = {}

      s = u.loc[(u['year']==year) & (u["age"]>=r[0]) & (u["age"]<=r[1]), 'population'].sum()
     ```

and also like for one condition

df_uk = df[df.con=='UK'].reset_index(drop=True)
div =['child','teen','working']
c = [div[i] for i in range(len(df_uk))] #list to get element from div
y = [i 2018 for i in range(1,len(df_uk) 1)] #list of 2019,2020,2021
x = [[[0,10], [10,20], [20,30]] for i in range(1,len(df_uk) 1)]

d={'year':y, 'age':x, 'con':c, 'population': (df_uk['value']   #adds_something).values}

df_new = pd.DataFrame(data=d)

df = pd.concat([df, df_new], ignore_index=True)

sorry if its a mess.. I asked people but no help... I am sure there can be easy and better loop function. Please Help!!!!

CodePudding user response：

d = { 'year': [2019,2020,2021,2020,2019,2021], 
      'age': [10,20,30,10,20,30], 
      'con': ['UK','UK','UK','US','US','US'],
      'population': [1,2,300,400,1000,2000]}
df = pd.DataFrame(data=d)
df2 = df.copy()

criteria = [df2['age'].between(0, 10), 
            df2['age'].between(11, 20), 
            df2['age'].between(21, 30)]

values = ['child', 'teen', 'work']

df2['con'] = df2['con'] '_' np.select(criteria, values, 0)
df2['population'] = df.groupby(['con', 'age']).sum()\
                      .groupby(level=0).cumsum()\
                      .reset_index()['population']

final = pd.concat([df, df2])