I know the annual price for four different apples.
Unfortunately, there are years when some apples are missing in price.
I would like to know the unit price for the first year in which the unit price was entered except for the missing year.
The code is as below.
import pandas as pd
import numpy as np
a = {'Price_Y19':[np.nan,np.nan,np.nan,10],
'Price_Y20':[np.nan,np.nan,10,9],
'Price_Y21':[np.nan,10,9,8],
'Price_Y22':[10,9,8,7]}
index_name = ['yellow apple','red apple','white apple','gray apple']
df = pd.DataFrame(data = a,
index = index_name)
df
I would like to get below DataFrame
b = {'Price_Y19':[np.nan,np.nan,np.nan,10],
'Price_Y20':[np.nan,np.nan,10,9],
'Price_Y21':[np.nan,10,9,8],
'Price_Y22':[10,9,8,7],
'Price_Initial':[10,10,10,10],
'Price_Final':[10,9,8,7],
'Price_Gap':[0,1,2,3]}
df1 = pd.DataFrame(data = b,
index = index_name)
df1
I always wait your answer. Thanks
I don't have any ideas how to make code.
df['Price_Initial] = some mehode
CodePudding user response:
You can backward or forward filling missing values with select first or last column by position for Initial/Final
prices, last subtract new columns:
df = df.assign(Price_Initial = df.bfill(axis=1).iloc[:, 0],
Price_Final = df.ffill(axis=1).iloc[:, -1],
Price_Gap = lambda x: x['Price_Initial'].sub(x['Price_Final']))
print (df)
Price_Y19 Price_Y20 Price_Y21 Price_Y22 Price_Initial \
yellow apple NaN NaN NaN 10 10.0
red apple NaN NaN 10.0 9 10.0
white apple NaN 10.0 9.0 8 10.0
gray apple 10.0 9.0 8.0 7 10.0
Price_Final Price_Gap
yellow apple 10.0 0.0
red apple 9.0 1.0
white apple 8.0 2.0
gray apple 7.0 3.0
If multiple columns and need filter only Price_YY
columns:
df1 = df.filter(regex='Price_Y\d{2}')
df = df.assign(Price_Initial = df1.bfill(axis=1).iloc[:, 0],
Price_Final = df1.ffill(axis=1).iloc[:, -1],
Price_Gap = lambda x: x['Price_Initial'].sub(x['Price_Final']))
print (df)
Price_Y19 Price_Y20 Price_Y21 Price_Y22 Price_Initial \
yellow apple NaN NaN NaN 10 10.0
red apple NaN NaN 10.0 9 10.0
white apple NaN 10.0 9.0 8 10.0
gray apple 10.0 9.0 8.0 7 10.0
Price_Final Price_Gap
yellow apple 10.0 0.0
red apple 9.0 1.0
white apple 8.0 2.0
gray apple 7.0 3.0
CodePudding user response:
With pandas.Series.first_valid_index
and pandas.Series.last_valid_index
functions:
df['Price_Initial'], df['Price_Final'] = df.apply(lambda x: (x[x.first_valid_index()], x[x.last_valid_index()])).values
df['Price_Gap'] = df['Price_Initial'] - df['Price_Final']
print(df)
Price_Y19 Price_Y20 Price_Y21 Price_Y22 Price_Initial \
yellow apple NaN NaN NaN 10 10.0
red apple NaN NaN 10.0 9 10.0
white apple NaN 10.0 9.0 8 10.0
gray apple 10.0 9.0 8.0 7 10.0
Price_Final Price_Gap
yellow apple 10.0 0.0
red apple 9.0 1.0
white apple 8.0 2.0
gray apple 7.0 3.0