I have the following dataframe which shows the date a person changes stages (a person can move from any stage to another):
Name | Stage | Date | Amount |
---|---|---|---|
Karen | One | 07/25/22 | $1 |
Karen | Two | 07/27/22 | $1 |
Jaren | One | 07/29/22 | $3 |
Laren | One | 07/25/22 | $2 |
Laren | Three | 07/28/22 | $2 |
Laren | Two | 07/30/22 | $2 |
I would like to get a representation of which stage everyone was on every day.
Stage | 07/25 | 07/26 | 07/27 | 07/28 | 07/29 | 07/30 |
---|---|---|---|---|---|---|
One | $3 | $3 | $2 | $0 | $3 | $3 |
Two | $0 | $0 | $1 | $1 | $1 | $3 |
Three | $0 | $0 | $0 | $2 | $2 | $0 |
You can see that both Karen and Laren were on stage 1 on 7/25 and this continued on 7/26 until Karen moved to stage 2 on 7/27.
CodePudding user response:
Use:
#Data preparation
string = """Name Stage Date Amount
Karen One 07/25/22 $1
Karen Two 07/27/22 $1
Jaren One 07/29/22 $3
Laren One 07/25/22 $2
Laren Three 07/28/22 $2
Laren Two 07/30/22 $2"""
data = [x.split(' ') for x in string.split('\n')]
df = pd.DataFrame(data[1:], columns = data[0])
df['Date'] = pd.to_datetime(df['Date'])
# Filling miss dates
idx = pd.date_range(df['Date'].min(), df['Date'].max())
temp = df.groupby('Name')['Date'].apply(lambda x: pd.DatetimeIndex(x).reindex(idx)[0].values).explode().to_frame().reset_index().merge(df, on=['Date', 'Name'], how='outer')
# Changing and pivoting
temp[['Stage', 'Amount']] = temp[['Stage', 'Amount']].ffill()
temp['Amount'] = temp['Amount'].str.replace('$', '').astype(float).ffill().fillna(0)
temp.pivot_table(index='Stage', columns='Date', values = 'Amount', aggfunc = 'sum', fill_value = 0, dropna = False)
Output:
Date 2022-07-25 2022-07-26 2022-07-27 2022-07-28 2022-07-29 2022-07-30
Stage
One 3 3 2 0 3 3
Three 0 0 0 2 2 0
Two 0 0 1 1 1 3
CodePudding user response:
assuming that you have int or float value in amount column
then you can use groupby and pivot to reshape dataframe
df.groupby(['stage', 'date'])['amount'].sum().reset_index().pivot(index='stage', columns='date', values='amount').reset_index().fillna(0)
df is your original dataframe