The question might be confusing but what I am trying to do is to extract the color from the most recent two years and also identify the year.
For example, I have a dataset below, for item1, the color for the most recent two years are red (2021) and black(2020), the desired outcome is to add 4 columns namely "this year color", "last year color", "this year", "last year" to show the corresponding information.
In the case of current year's color (2021) is missing, then the logic is to take the next year as current year, like item 3, this year's color is green (2020) and last year color is green (2018).
I wonder if there is a way to do it in python? Any advice or suggestion would be gladly appericated.
import pandas as pd
df1 = { 'item':['item1','item2','item3','item4'],
'2018':['red','green','',''],
'2019':['black','','green','red'],
'2020':['brown', 'blue','','blue'],
'2021':['','','green','black']}
df1=pd.DataFrame(df1)
df1
The desired output is like below:
CodePudding user response:
IIUC, you can use a custom function:
def getlast(s):
s = s[s.ne('')][:2]
return pd.Series(s.to_list() s.index.to_list(),
index=['this year color', 'last year color', 'this year', 'last year'])
df2 = df1.join(df1
.drop('item', axis=1)
.sort_index(axis=1, ascending=False)
.apply(getlast, axis=1)
)
output:
item 2018 2019 2020 2021 this year color last year color this year last year
0 item1 red black brown brown black 2020 2019
1 item2 green blue blue green 2020 2018
2 item3 green green green green 2021 2019
3 item4 red blue black black blue 2021 2020
or using melt
and merge
:
df2 = df1.merge(df1
.melt(id_vars='item')
.loc[lambda d: d['value'].ne('')]
.groupby('item')
.agg(**{'this year color': ('value', lambda g: g.iloc[-1]),
'last year color': ('value', lambda g: g.iloc[-2]),
'this year': ('variable', lambda g: g.iloc[-1]),
'last year': ('variable', lambda g: g.iloc[-2])}),
left_on='item', right_index=True
)