Home > Blockchain >  How to set a value of a column to the value of grouping with pandas?
How to set a value of a column to the value of grouping with pandas?

Time:09-20

The request

My current solution is totally whacky and inefficient with nested loops. There must be a proper way to get the same results.
I wonder if all this can be achieved in just 1 step, or the first step with grouping is necessary?

My solution:

Original data:

    year    rank    country continent gdp        gdp_percent
 0  1960    1   USA         America 543300000000    0.468483
 1  1960    2   UK          Europe  73233967692     0.063149
 2  1960    3   France      Europe  62225478000     0.053656
 3  1960    4   China       Asia    59716467625     0.051493
 4  1960    5   Japan       Asia    44307342950     0.038206

The problem

I want to add a column that will have gdp percent of a continent. It's a simple operation:

1 / SUM(GDP_PERCENT)(grouped by year continent) * gdp_percent

Code:

## Get the sum 
df_grouped = df.groupby(['year', 'continent'])['gdp_percent'].sum().to_frame()

## Nested loops ...
for grouped_df_index, grouped_df_row in df_grouped.iterrows():
    for df_index, df_row in df.iterrows():
        if df_row[0] == grouped_df_row[0] and df_row[3] == grouped_df_row[1]:
            df.loc[df_index, 'gdp_percent_of_continent'] = grouped_df_row[2]

Final result (the last column):

    year    rank country   continent gdp         gdp_percent    gdp_percent_of_continent
0   1960    1   USA        America  543300000000    0.468483    0.848412
1   1960    2   UK         Europe   73233967692     0.063149    0.257104
2   1960    3   France     Europe   62225478000     0.053656    0.218456
3   1960    4   China      Asia     59716467625     0.051493    0.333881
4   1960    5   Japan      Asia     44307342950     0.038206    0.247727

CodePudding user response:

try:

df['gdp_percent_sum'] = df.groupby(['year', 'continent'])['gdp_percent'].transform(sum)
df['gdp_percent_of_continent'] = (1/df['gdp_percent_sum'])*df['gdp_percent']
df = df.drop(columns="gdp_percent_sum")

    year    rank    country continent   gdp          gdp_percent gdp_percent_of_continent
0   1960    1       USA     America     54330000000  0.468483    1.000000
1   1960    2       UK      Europe      73233967692  0.063149    0.540636
2   1960    3       France  Europe      62225478000  0.053656    0.459364
3   1960    4       China   Asia        59716467625  0.051493    0.574064
4   1960    5       Japan   Asia        44307342950  0.038206    0.425936
  • Related