The request
My current solution is totally whacky and inefficient with nested loops. There must be a proper way to get the same results.
I wonder if all this can be achieved in just 1 step, or the first step with grouping is necessary?
My solution:
Original data:
year rank country continent gdp gdp_percent
0 1960 1 USA America 543300000000 0.468483
1 1960 2 UK Europe 73233967692 0.063149
2 1960 3 France Europe 62225478000 0.053656
3 1960 4 China Asia 59716467625 0.051493
4 1960 5 Japan Asia 44307342950 0.038206
The problem
I want to add a column that will have gdp percent of a continent. It's a simple operation:
1 / SUM(GDP_PERCENT)(grouped by year continent) * gdp_percent
Code:
## Get the sum
df_grouped = df.groupby(['year', 'continent'])['gdp_percent'].sum().to_frame()
## Nested loops ...
for grouped_df_index, grouped_df_row in df_grouped.iterrows():
for df_index, df_row in df.iterrows():
if df_row[0] == grouped_df_row[0] and df_row[3] == grouped_df_row[1]:
df.loc[df_index, 'gdp_percent_of_continent'] = grouped_df_row[2]
Final result (the last column):
year rank country continent gdp gdp_percent gdp_percent_of_continent
0 1960 1 USA America 543300000000 0.468483 0.848412
1 1960 2 UK Europe 73233967692 0.063149 0.257104
2 1960 3 France Europe 62225478000 0.053656 0.218456
3 1960 4 China Asia 59716467625 0.051493 0.333881
4 1960 5 Japan Asia 44307342950 0.038206 0.247727
CodePudding user response:
try:
df['gdp_percent_sum'] = df.groupby(['year', 'continent'])['gdp_percent'].transform(sum)
df['gdp_percent_of_continent'] = (1/df['gdp_percent_sum'])*df['gdp_percent']
df = df.drop(columns="gdp_percent_sum")
year rank country continent gdp gdp_percent gdp_percent_of_continent
0 1960 1 USA America 54330000000 0.468483 1.000000
1 1960 2 UK Europe 73233967692 0.063149 0.540636
2 1960 3 France Europe 62225478000 0.053656 0.459364
3 1960 4 China Asia 59716467625 0.051493 0.574064
4 1960 5 Japan Asia 44307342950 0.038206 0.425936