Home > Software engineering >  Python Pandas - Dataframe - Add column depending on another column, which has a mathematical operati
Python Pandas - Dataframe - Add column depending on another column, which has a mathematical operati

Time:05-02

I have a Pandas dataframe that looks something like this:

timestamp Place Data A Data B Data C
16508 France 0.03 0.06 0.15
16510 England 0.05 0.07 0.11
16515 England 0.04 0.03 0.87

What I would like to do is the following:

  • Add a new column for every different value in the column "Place".
  • In this new column, add the division between Data A and Data B in percentage (Data A / Data B * 100).

The expected output would be:

timestamp Place Data A Data B Data C To France To England
16508 France 0.03 0.06 0.15 50 0
16510 England 0.05 0.07 0.11 0 71.42
16515 England 0.04 0.03 0.87 0 133.33

I tried the following:

for column in data['Place'].unique():
    column_name = f'To {Place}'
    data[column_name] = data[data['Place'] == column]['Data A'].div(['Data B'])*100
    data[column_name].fillna(method='ffill', inplace=True)
    data[column_name].fillna(value=0, inplace=True)

But it's not working. I get a "'list' object has no attribute 'div'" error. I have tried other different things but they are not working either.

Could somebody give me a hand with this?

Thanks in advance.

CodePudding user response:

IIUC, you can try with pivot:

df["Ratio"] = df["Data A"].div(df["Data B"])

output = df.drop("Ratio", axis=1).join(df.pivot(None, "Place", "Ratio").mul(100).fillna(0).add_prefix("To "))

>>> output
     Place  Data A  Data B  Data C  To England  To France
0   France    0.03    0.06    0.15    0.000000       50.0
1  England    0.05    0.07    0.11   71.428571        0.0
2  England    0.04    0.03    0.87  133.333333        0.0

CodePudding user response:

I'd do it like this:

df_ratio = ((df['Data A'].div(df['Data B'])*100).to_frame()
               .assign(col='To ' df['Place'])
               .set_index('col', append=True)[0]
               .unstack(fill_value=0))

pd.concat([df, df_ratio], axis=1))

Output:

   timestamp    Place  Data A  Data B  Data C  To England  To France
0      16508   France    0.03    0.06    0.15    0.000000       50.0
1      16510  England    0.05    0.07    0.11   71.428571        0.0
2      16515  England    0.04    0.03    0.87  133.333333        0.0
  • Related