Home > Back-end >  Column disappearing after .apply - Pandas (Python)
Column disappearing after .apply - Pandas (Python)

Time:11-29

I'm new to pandas and I'm trying to merge the following 2 dataframes into 1 :

                     nopat
0 2021-12-31  3.580000e 09
1 2020-12-31  6.250000e 08
2 2019-12-31 -1.367000e 09
3 2018-12-31  2.028000e 09
              capital_employed
0 2021-12-31      5.924000e 10
1 2020-12-31      6.062400e 10
2 2019-12-31      5.203500e 10
3 2018-12-31      5.441200e 10

When I try to apply a function to my new datframe, all columns disappear. Here is my code :

roce_by_year = pd.merge(nopat, capital_employed) \
        .rename(columns={"": "date"}) \
        .sort_values(by='date') \
        .apply(lambda row: compute_roce(row['nopat'], row['capital_employed']), axis=1) \
        .reset_index(name='roce')

Here is the result :

   index      roce
0      3  3.727119
1      2 -2.627078
2      1  1.030945
3      0  6.043214

I would like to have the following result :

   date       roce
0   2018  3.727119
1   2019 -2.627078
2   2020  1.030945
3   2021  6.043214

Do you have an explanation ?

CodePudding user response:

If you want a method-chained solution, you could use something like this:

import pandas as pd


roce_by_year = (
    pd.merge(nopat, capital_employed)
    .rename(columns={"": "date"})
    .assign(
        date=lambda xdf: pd.to_datetime(
            xdf["date"], errors="coerce"
        ).dt.year
    )
    .assign(
        roce=lambda xdf: xdf.apply(
            lambda row: compute roce(
                row["nopat"], row["capital_employed"]
            ), axis=1
        )
    )
    .sort_values("date", ascending=True)
)[["date", "roce"]]

CodePudding user response:

df1['date'] = pd.to_datetime(df1['date'])
df1
###
        date       nopat
0 2021-12-31  3580000000
1 2020-12-31   625000000
2 2019-12-31 -1367000000
3 2018-12-31  2028000000
df2['date'] = pd.to_datetime(df2['date'])
df2
###
        date  capital_employed
0 2021-12-31       59240000000
1 2020-12-31       60624000000
2 2019-12-31       52035000000
3 2018-12-31       54412000000
df3 = pd.merge(df1, df2, how='outer', left_on='date', right_on='date')\
    .pipe(lambda x: x.assign(roe = x['nopat']/x['capital_employed']))\
    .sort_values(by='date', ascending=True)\
    .pipe(lambda x: x[['date', 'roe']])\
    .pipe(lambda x: x.assign(date = x['date'].dt.strftime('%Y'))).reset_index(drop=True)
df3
###
   date       roe
0  2018  0.037271
1  2019 -0.026271
2  2020  0.010309
3  2021  0.060432

CodePudding user response:

Apply creates only the new column. You can try to create a new column on the existing dataframe like

nopat.rename(columns={"": "date"}, inplace=True)
nopat.sort_values(by='date', inplace=True)

nopat.set_index('date', inplace=True)
capital_employed.rename(columns={"": "date"}, inplace=True)
capital_employed.set_index('date', inplace=True)
capital_employed.sort_values(by='date', inplace=True)
df = nopat.join(capital_employed, on='date')
df['roce'] = df.apply(lambda row: compute_roce(row['nopat'], 
                      row['capital_employed']), axis=1)
  • Related