Home > Mobile >  Create multiple new pandas column based on other columns in a loop
Create multiple new pandas column based on other columns in a loop

Time:07-26

Assuming I have the following toy dataframe, df:

Country     Population    Region          HDI

China        100          Asia           High  
Canada        15          NAmerica     V.High  
Mexico        25          NAmerica     Medium 
Ethiopia      30            Africa        Low

I would like to create new columns based on the population, region, and HDI of Ethiopia in a loop. I tried the following method, but it is time-consuming when a lot of columns are involved.

df['Population_2'] = df['Population'][df['Country'] == "Ethiopia"]
df['Region_2'] = df['Region'][df['Country'] == "Ethiopia"]
df['Population_2'].fillna(method='ffill')

My final DataFrame df should look like:

Country     Population    Region         HDI    Population_2   Region_2    HDI_2

China        100          Asia          High      30            Africa       Low 
Canada        15          NAmerica    V.High      30            Africa       Low 
Mexico        25          NAmerica    Medium      30            Africa       Low 
Ethiopia      30            Africa       Low      30            Africa       Low

CodePudding user response:

How about this?

for col in ['Population', 'Region', 'HDI']:
    df[col   '_2'] = df.loc[df.Country=='Ethiopia', col].iat[0]

I don't quite understand the broader point of what you're trying to do, and if Ethiopia could have multiple values the solution might be different. But this works for the problem as you presented it.

CodePudding user response:

You can use:

# select Ethiopia row and add suffix "_2" to the columns (except Country)
s = (df.drop(columns='Country')
       .loc[df['Country'].eq('Ethiopia')].add_suffix('_2').squeeze()
     )

# broadcast as new columns
df[s.index] = s

output:

    Country  Population    Region     HDI  Population_2 Region_2 HDI_2
0     China         100      Asia    High            30   Africa   Low
1    Canada          15  NAmerica  V.High            30   Africa   Low
2    Mexico          25  NAmerica  Medium            30   Africa   Low
3  Ethiopia          30    Africa     Low            30   Africa   Low

CodePudding user response:

You can use assign and also assuming that you have only row corresponding to Ethiopia:

d = dict(zip(df.columns.drop('Country').map('{}_2'.format), 
         df.set_index('Country').loc['Ethiopia']))

df = df.assign(**d)

print(df):

    Country  Population    Region     HDI  Population_2 Region_2 HDI_2
0     China         100      Asia    High            30   Africa   Low
1    Canada          15  NAmerica  V.High            30   Africa   Low
2    Mexico          25  NAmerica  Medium            30   Africa   Low
3  Ethiopia          30    Africa     Low            30   Africa   Low
  • Related