I have a very complex situation to append the rows with one agg.
function of sum of population
based on different columns. Find below:
Please make sure I have multiple rows in all column
such as "year"
in range (2019,2040)
& "con"
have mulitple countries.
import pandas as pd
d = { 'year': [2019,2020,2021,2020,2019,2021], 'age': [10,20,30,10,20,30], 'con': ['UK','UK','UK','US','US','US'],'population': [1,2,300,400,1000,2000]}
df = pd.DataFrame(data=d)
df
year age con population
2019 10 UK 1
2020 20 UK 2
2021 30 UK 300
2020 10 US 400
2019 20 US 1000
2021 30 US 2000
output required:
year age con population
2019 10 UK 1
2020 20 UK 2
2021 30 UK 300
2020 10 US 400
2019 20 US 1000
2021 30 US 2000
2019 0-10 UK child 1 #addition of 1
2020 10-20 UK teen 3 #addition of 1 2
2021 20-30 UK working 302 #addition of 2 300
I am looking for a loop function so I apply on con
col
I am trying, FAILED!!!
variable_list = ['UK', 'US']
ranges = [[0,10], [10,20], [20,30]]
categories = ["Child", "teen", "work"]
year = [x for x in range(2019,2022)]
q = df#df.loc[(df["Kategorie 1"].str.strip()==BASE)]
q["age2"] = pd.to_numeric(q["age"])
sums_years = {}
for variable in variable_list:
c = 0
u = q.loc[q["cat2"]==variable]
for r in ranges:
cat = "Germany: " categories[c]
for year in date:
group = str(r[0]) '-' str(r[1])
n = variable "_" group
if n not in sums_years:
sums_years[n] = {}
s = u.loc[(u['year']==year) & (u["age"]>=r[0]) & (u["age"]<=r[1]), 'population'].sum()
```
and also like for one condition
df_uk = df[df.con=='UK'].reset_index(drop=True)
div =['child','teen','working']
c = [div[i] for i in range(len(df_uk))] #list to get element from div
y = [i 2018 for i in range(1,len(df_uk) 1)] #list of 2019,2020,2021
x = [[[0,10], [10,20], [20,30]] for i in range(1,len(df_uk) 1)]
d={'year':y, 'age':x, 'con':c, 'population': (df_uk['value'] #adds_something).values}
df_new = pd.DataFrame(data=d)
df = pd.concat([df, df_new], ignore_index=True)
sorry if its a mess.. I asked people but no help... I am sure there can be easy and better loop function. Please Help!!!!
CodePudding user response:
d = { 'year': [2019,2020,2021,2020,2019,2021],
'age': [10,20,30,10,20,30],
'con': ['UK','UK','UK','US','US','US'],
'population': [1,2,300,400,1000,2000]}
df = pd.DataFrame(data=d)
df2 = df.copy()
criteria = [df2['age'].between(0, 10),
df2['age'].between(11, 20),
df2['age'].between(21, 30)]
values = ['child', 'teen', 'work']
df2['con'] = df2['con'] '_' np.select(criteria, values, 0)
df2['population'] = df.groupby(['con', 'age']).sum()\
.groupby(level=0).cumsum()\
.reset_index()['population']
final = pd.concat([df, df2])