I would like to create a new column every time I get 1 in the 'Signal' column that will cast the corresponding value from the 'Value' column (please see the expected output below).
Initial data:
Index | Value | Signal |
---|---|---|
0 | 3 | 0 |
1 | 8 | 0 |
2 | 8 | 0 |
3 | 7 | 1 |
4 | 9 | 0 |
5 | 10 | 0 |
6 | 14 | 1 |
7 | 10 | 0 |
8 | 10 | 0 |
9 | 4 | 1 |
10 | 10 | 0 |
11 | 10 | 0 |
Expected Output:
Index | Value | Signal | New_Col_1 | New_Col_2 | New_Col_3 |
---|---|---|---|---|---|
0 | 3 | 0 | 0 | 0 | 0 |
1 | 8 | 0 | 0 | 0 | 0 |
2 | 8 | 0 | 0 | 0 | 0 |
3 | 7 | 1 | 7 | 0 | 0 |
4 | 9 | 0 | 7 | 0 | 0 |
5 | 10 | 0 | 7 | 0 | 0 |
6 | 14 | 1 | 7 | 14 | 0 |
7 | 10 | 0 | 7 | 14 | 0 |
8 | 10 | 0 | 7 | 14 | 0 |
9 | 4 | 1 | 7 | 14 | 4 |
10 | 10 | 0 | 7 | 14 | 4 |
11 | 10 | 0 | 7 | 14 | 4 |
What would be a way to do it?
CodePudding user response:
You can use a pivot:
out = df.join(df
# keep only the values where signal is 1
# and get Signal's cumsum
.assign(val=df['Value'].where(df['Signal'].eq(1)),
col=df['Signal'].cumsum()
)
# pivot cumsumed Signal to columns
.pivot(index='Index', columns='col', values='val')
# ensure column 0 is absent (using loc to avoid KeyError)
.loc[:, 1:]
# forward fill the values
.ffill()
# rename columns
.add_prefix('New_Col_')
)
output:
Index Value Signal New_Col_1 New_Col_2 New_Col_3
0 0 3 0 NaN NaN NaN
1 1 8 0 NaN NaN NaN
2 2 8 0 NaN NaN NaN
3 3 7 1 7.0 NaN NaN
4 4 9 0 7.0 NaN NaN
5 5 10 0 7.0 NaN NaN
6 6 14 1 7.0 14.0 NaN
7 7 10 0 7.0 14.0 NaN
8 8 10 0 7.0 14.0 NaN
9 9 4 1 7.0 14.0 4.0
10 10 10 0 7.0 14.0 4.0
11 11 10 0 7.0 14.0 4.0
CodePudding user response:
#create new column by incrementing the rows that has signal
df['new_col']='new_col_' df['Signal'].cumsum().astype(str)
#rows having no signal, make them null
df['new_col'] = df['new_col'].mask(df['Signal']==0, '0')
#pivot table
df2=(df.pivot(index=['Index','Signal', 'Value'], columns='new_col', values='Value')
.reset_index()
.ffill().fillna(0) #forward fill and fillna with 0
.drop(columns=['0','Index'] ) #drop the extra columns
.rename_axis(columns={'new_col':'Index'}) # rename the axis
.astype(int)) # changes values to int, removing decimals
df2
Index Signal Value new_col_1 new_col_2 new_col_3
0 0 3 0 0 0
1 0 8 0 0 0
2 0 8 0 0 0
3 1 7 7 0 0
4 0 9 7 0 0
5 0 10 7 0 0
6 1 14 7 14 0
7 0 10 7 14 0
8 0 10 7 14 0
9 1 4 7 14 4
10 0 10 7 14 4
11 0 10 7 14 4