I have a dataframe with a column with either a 1 or 0 in it. This is the Signal column. I want to cycle through this dataframe until I get to the first 1 then take the value in the Open column and put that into another Dataframe Total, column Buy Then as it continues through the dataframe when it reaches the first 0 then take that value in the Open column and put that into the same Dataframe Total, column Sold.
I know I need a loop within a loop but I'm not getting very far!
Any pointers/help would be appreciated!
Total = DataFrame()
for i in range(len(df)) :
if i.Signal == 1 :
Total['Buy'] = i.Open
if i.Signal == 0:
Total['Sold'] = i.Open
I know the code is wrong!
Cheers
CodePudding user response:
df = pd.DataFrame({'Signal': [0,0,1,1,1,1,0,0,1,1,1,1,0,0,0,1,1,1,0,0], 'Open': np.random.rand(20)})
>>> df
| | Signal | Open |
|---:|---------:|----------:|
| 0 | 0 | 0.959061 |
| 1 | 0 | 0.820516 |
| 2 | 1 | 0.0562783 |
| 3 | 1 | 0.612508 |
| 4 | 1 | 0.288703 |
| 5 | 1 | 0.332118 |
| 6 | 0 | 0.949236 |
| 7 | 0 | 0.20909 |
| 8 | 1 | 0.574924 |
| 9 | 1 | 0.170012 |
| 10 | 1 | 0.0255655 |
| 11 | 1 | 0.788829 |
| 12 | 0 | 0.39462 |
| 13 | 0 | 0.493338 |
| 14 | 0 | 0.347471 |
| 15 | 1 | 0.574096 |
| 16 | 1 | 0.286367 |
| 17 | 1 | 0.131327 |
| 18 | 0 | 0.38943 |
| 19 | 0 | 0.592241 |
# get the position of the first 1
first_1 = (df['Signal']==1).idxmax()
# Create a mask with True in the position of the first 1
# and every time a different value appears (0 after a 1, or 1 after a 0)
mask = np.full(len(df), False)
mask[first_1] = True
for i in range (first_1 1, len(df)):
mask[i] = df['Signal'][i] != df['Signal'][i-1]
>>> df[mask]
| | Signal | Open |
|---:|---------:|----------:|
| 2 | 1 | 0.0562783 |
| 6 | 0 | 0.949236 |
| 8 | 1 | 0.574924 |
| 12 | 0 | 0.39462 |
| 15 | 1 | 0.574096 |
| 18 | 0 | 0.38943 |
# Create new DF with 'Buy' = odd values of masked df['Open']
# and 'Sold' = even values of masked df['Open']
open_values = df[mask]['Open'].to_numpy()
total = pd.DataFrame({'Buy': [open_values[i] for i in range(0, len(open_values), 2)], 'Sold': [open_values[i] for i in range(1, len(open_values), 2)]})
>>> total
| | Buy | Sold |
|---:|----------:|---------:|
| 0 | 0.0562783 | 0.949236 |
| 1 | 0.574924 | 0.39462 |
| 2 | 0.574096 | 0.38943 |
It works under the assumption that the original df
table ends with 0s and not with 1s, i.e. for each first 1 in a row, there must be at least one 0 afterwards.
The assumption makes sense since the objective is to take differences later.
If the last value is 1, it will produce ValueError: All arrays must be of the same length
.