Amount of stage by day given stage change dates-CodePudding

I have the following dataframe which shows the date a person changes stages (a person can move from any stage to another):

Name	Stage	Date	Amount
Karen	One	07/25/22	$1
Karen	Two	07/27/22	$1
Jaren	One	07/29/22	$3
Laren	One	07/25/22	$2
Laren	Three	07/28/22	$2
Laren	Two	07/30/22	$2

I would like to get a representation of which stage everyone was on every day.

Stage	07/25	07/26	07/27	07/28	07/29	07/30
One	$3	$3	$2	$0	$3	$3
Two	$0	$0	$1	$1	$1	$3
Three	$0	$0	$0	$2	$2	$0

You can see that both Karen and Laren were on stage 1 on 7/25 and this continued on 7/26 until Karen moved to stage 2 on 7/27.

CodePudding user response：

Use:

#Data preparation
string = """Name    Stage   Date    Amount
Karen   One 07/25/22    $1
Karen   Two 07/27/22    $1
Jaren   One 07/29/22    $3
Laren   One 07/25/22    $2
Laren   Three   07/28/22    $2
Laren   Two 07/30/22    $2"""
data = [x.split('   ') for x in string.split('\n')]
df = pd.DataFrame(data[1:], columns = data[0])
df['Date'] = pd.to_datetime(df['Date'])

# Filling miss dates
idx = pd.date_range(df['Date'].min(), df['Date'].max())
temp = df.groupby('Name')['Date'].apply(lambda x: pd.DatetimeIndex(x).reindex(idx)[0].values).explode().to_frame().reset_index().merge(df, on=['Date', 'Name'], how='outer')

# Changing and pivoting

temp[['Stage', 'Amount']] = temp[['Stage', 'Amount']].ffill()
temp['Amount'] = temp['Amount'].str.replace('$', '').astype(float).ffill().fillna(0)
temp.pivot_table(index='Stage', columns='Date', values = 'Amount', aggfunc = 'sum', fill_value = 0, dropna = False)

Output:

Date    2022-07-25  2022-07-26  2022-07-27  2022-07-28  2022-07-29  2022-07-30
Stage                       
One 3   3   2   0   3   3
Three   0   0   0   2   2   0
Two 0   0   1   1   1   3

CodePudding user response：

assuming that you have int or float value in amount column

then you can use groupby and pivot to reshape dataframe

df.groupby(['stage', 'date'])['amount'].sum().reset_index().pivot(index='stage', columns='date', values='amount').reset_index().fillna(0)

df is your original dataframe