I am working with a data frame that is written in wide format. Each book has a number of sales, but some quarters have null values because the book was not released before that quarter.
import pandas as pd
data = {'Book Title': ['A Court of Thorns and Roses', 'Where the Crawdads Sing', 'Bad Blood', 'Atomic Habits'],
'Metric': ['Book Sales','Book Sales','Book Sales','Book Sales'],
'Q1 2022': [100000,0,0,0],
'Q2 2022': [50000,75000,0,35000],
'Q3 2022': [25000,150000,20000,45000],
'Q4 2022': [25000,20000,10000,65000]}
df1 = pd.DataFrame(data)
What I would like to do is create one field that identifies "ID of first available quarter" ("First Quarter ID"), and another that identifies "ID of quarter with maximum sales" ("Max Quarter ID"). Then I would like to show two fields with the sales in the first available quarter and the second available quarter.
Tips to go about this? Thank you!
CodePudding user response:
A possible solution:
df1 = df1.replace(0, np.nan)
a = df1.shape[1] - np.argmin(df1.notna().values[:, ::-1], axis=1)
a = df1.columns[np.where(a == df1.shape[1], 2, a)]
b = df1.iloc[:, 2:].columns[np.nanargmax(df1.iloc[:, 2:].values, axis=1)]
df1['First Quarter ID'] = a
df1['Max Quarter ID'] = b
Output:
Book Title Metric Q1 2022 Q2 2022 Q3 2022 \
0 A Court of Thorns and Roses Book Sales 100000.0 50000.0 25000
1 Where the Crawdads Sing Book Sales NaN 75000.0 150000
2 Bad Blood Book Sales NaN NaN 20000
3 Atomic Habits Book Sales NaN 35000.0 45000
Q4 2022 First Quarter ID Max Quarter ID
0 25000 Q1 2022 Q1 2022
1 20000 Q2 2022 Q3 2022
2 10000 Q3 2022 Q3 2022
3 65000 Q2 2022 Q4 2022
CodePudding user response:
CodePudding user response:
Maybe this is what you are looking for.
import pandas as pd
import numpy as np
data = {'Book Title': ['A Court of Thorns and Roses', 'Where the Crawdads Sing', 'Bad Blood', 'Atomic Habits'],
'Metric': ['Book Sales','Book Sales','Book Sales','Book Sales'],
'Q1 2022': [100000,0,0,0],
'Q2 2022': [50000,75000,0,35000],
'Q3 2022': [25000,150000,20000,45000],
'Q4 2022': [25000,20000,10000,65000]}
df1 = pd.DataFrame(data)
df1['First Quarter ID'] = [df1.iloc[idx, 2:].replace(0, np.nan).first_valid_index() for idx in df1.index]
df1['Max Quarter ID'] = df1.set_index(['Book Title', 'Metric']).iloc[:, :-1].idxmax(axis=1).to_list()