Home > Net >  Amount of stage by day given stage change dates
Amount of stage by day given stage change dates

Time:08-01

I have the following dataframe which shows the date a person changes stages (a person can move from any stage to another):

Name Stage Date Amount
Karen One 07/25/22 $1
Karen Two 07/27/22 $1
Jaren One 07/29/22 $3
Laren One 07/25/22 $2
Laren Three 07/28/22 $2
Laren Two 07/30/22 $2

I would like to get a representation of which stage everyone was on every day.

Stage 07/25 07/26 07/27 07/28 07/29 07/30
One $3 $3 $2 $0 $3 $3
Two $0 $0 $1 $1 $1 $3
Three $0 $0 $0 $2 $2 $0

You can see that both Karen and Laren were on stage 1 on 7/25 and this continued on 7/26 until Karen moved to stage 2 on 7/27.

CodePudding user response:

Use:

#Data preparation
string = """Name    Stage   Date    Amount
Karen   One 07/25/22    $1
Karen   Two 07/27/22    $1
Jaren   One 07/29/22    $3
Laren   One 07/25/22    $2
Laren   Three   07/28/22    $2
Laren   Two 07/30/22    $2"""
data = [x.split('   ') for x in string.split('\n')]
df = pd.DataFrame(data[1:], columns = data[0])
df['Date'] = pd.to_datetime(df['Date'])

# Filling miss dates
idx = pd.date_range(df['Date'].min(), df['Date'].max())
temp = df.groupby('Name')['Date'].apply(lambda x: pd.DatetimeIndex(x).reindex(idx)[0].values).explode().to_frame().reset_index().merge(df, on=['Date', 'Name'], how='outer')

# Changing and pivoting

temp[['Stage', 'Amount']] = temp[['Stage', 'Amount']].ffill()
temp['Amount'] = temp['Amount'].str.replace('$', '').astype(float).ffill().fillna(0)
temp.pivot_table(index='Stage', columns='Date', values = 'Amount', aggfunc = 'sum', fill_value = 0, dropna = False)

Output:

Date    2022-07-25  2022-07-26  2022-07-27  2022-07-28  2022-07-29  2022-07-30
Stage                       
One 3   3   2   0   3   3
Three   0   0   0   2   2   0
Two 0   0   1   1   1   3

CodePudding user response:

assuming that you have int or float value in amount column

then you can use groupby and pivot to reshape dataframe

df.groupby(['stage', 'date'])['amount'].sum().reset_index().pivot(index='stage', columns='date', values='amount').reset_index().fillna(0)

df is your original dataframe

  • Related