I'm new to pandas and I'm trying to merge the following 2 dataframes into 1 :
nopat
0 2021-12-31 3.580000e 09
1 2020-12-31 6.250000e 08
2 2019-12-31 -1.367000e 09
3 2018-12-31 2.028000e 09
capital_employed
0 2021-12-31 5.924000e 10
1 2020-12-31 6.062400e 10
2 2019-12-31 5.203500e 10
3 2018-12-31 5.441200e 10
When I try to apply a function to my new datframe, all columns disappear. Here is my code :
roce_by_year = pd.merge(nopat, capital_employed) \
.rename(columns={"": "date"}) \
.sort_values(by='date') \
.apply(lambda row: compute_roce(row['nopat'], row['capital_employed']), axis=1) \
.reset_index(name='roce')
Here is the result :
index roce
0 3 3.727119
1 2 -2.627078
2 1 1.030945
3 0 6.043214
I would like to have the following result :
date roce
0 2018 3.727119
1 2019 -2.627078
2 2020 1.030945
3 2021 6.043214
Do you have an explanation ?
CodePudding user response:
If you want a method-chained solution, you could use something like this:
import pandas as pd
roce_by_year = (
pd.merge(nopat, capital_employed)
.rename(columns={"": "date"})
.assign(
date=lambda xdf: pd.to_datetime(
xdf["date"], errors="coerce"
).dt.year
)
.assign(
roce=lambda xdf: xdf.apply(
lambda row: compute roce(
row["nopat"], row["capital_employed"]
), axis=1
)
)
.sort_values("date", ascending=True)
)[["date", "roce"]]
CodePudding user response:
df1['date'] = pd.to_datetime(df1['date'])
df1
###
date nopat
0 2021-12-31 3580000000
1 2020-12-31 625000000
2 2019-12-31 -1367000000
3 2018-12-31 2028000000
df2['date'] = pd.to_datetime(df2['date'])
df2
###
date capital_employed
0 2021-12-31 59240000000
1 2020-12-31 60624000000
2 2019-12-31 52035000000
3 2018-12-31 54412000000
df3 = pd.merge(df1, df2, how='outer', left_on='date', right_on='date')\
.pipe(lambda x: x.assign(roe = x['nopat']/x['capital_employed']))\
.sort_values(by='date', ascending=True)\
.pipe(lambda x: x[['date', 'roe']])\
.pipe(lambda x: x.assign(date = x['date'].dt.strftime('%Y'))).reset_index(drop=True)
df3
###
date roe
0 2018 0.037271
1 2019 -0.026271
2 2020 0.010309
3 2021 0.060432
CodePudding user response:
Apply creates only the new column. You can try to create a new column on the existing dataframe like
nopat.rename(columns={"": "date"}, inplace=True)
nopat.sort_values(by='date', inplace=True)
nopat.set_index('date', inplace=True)
capital_employed.rename(columns={"": "date"}, inplace=True)
capital_employed.set_index('date', inplace=True)
capital_employed.sort_values(by='date', inplace=True)
df = nopat.join(capital_employed, on='date')
df['roce'] = df.apply(lambda row: compute_roce(row['nopat'],
row['capital_employed']), axis=1)