Home > database >  Python Dataframe Checking for a value then adding values to another DataFrame
Python Dataframe Checking for a value then adding values to another DataFrame

Time:08-05

I have a dataframe with a column with either a 1 or 0 in it. This is the Signal column. I want to cycle through this dataframe until I get to the first 1 then take the value in the Open column and put that into another Dataframe Total, column Buy Then as it continues through the dataframe when it reaches the first 0 then take that value in the Open column and put that into the same Dataframe Total, column Sold.

I know I need a loop within a loop but I'm not getting very far!

Any pointers/help would be appreciated!

Total = DataFrame()

for i in range(len(df)) :

  if i.Signal == 1 :
    Total['Buy'] = i.Open
      if i.Signal == 0:
        Total['Sold'] = i.Open

I know the code is wrong!

Cheers

Example of DataFrame

CodePudding user response:

df = pd.DataFrame({'Signal': [0,0,1,1,1,1,0,0,1,1,1,1,0,0,0,1,1,1,0,0], 'Open': np.random.rand(20)})

>>> df

|    |   Signal |      Open |
|---:|---------:|----------:|
|  0 |        0 | 0.959061  |
|  1 |        0 | 0.820516  |
|  2 |        1 | 0.0562783 |
|  3 |        1 | 0.612508  |
|  4 |        1 | 0.288703  |
|  5 |        1 | 0.332118  |
|  6 |        0 | 0.949236  |
|  7 |        0 | 0.20909   |
|  8 |        1 | 0.574924  |
|  9 |        1 | 0.170012  |
| 10 |        1 | 0.0255655 |
| 11 |        1 | 0.788829  |
| 12 |        0 | 0.39462   |
| 13 |        0 | 0.493338  |
| 14 |        0 | 0.347471  |
| 15 |        1 | 0.574096  |
| 16 |        1 | 0.286367  |
| 17 |        1 | 0.131327  |
| 18 |        0 | 0.38943   |
| 19 |        0 | 0.592241  |
# get the position of the first 1
first_1 = (df['Signal']==1).idxmax()

# Create a mask with True in the position of the first 1
# and every time a different value appears (0 after a 1, or 1 after a 0)
mask = np.full(len(df), False)
mask[first_1] = True
for i in range (first_1   1, len(df)):
    mask[i] = df['Signal'][i] != df['Signal'][i-1]

>>> df[mask]

|    |   Signal |      Open |
|---:|---------:|----------:|
|  2 |        1 | 0.0562783 |
|  6 |        0 | 0.949236  |
|  8 |        1 | 0.574924  |
| 12 |        0 | 0.39462   |
| 15 |        1 | 0.574096  |
| 18 |        0 | 0.38943   |
# Create new DF with 'Buy' = odd values of masked df['Open']
# and 'Sold' = even values of masked df['Open']

open_values = df[mask]['Open'].to_numpy()
total = pd.DataFrame({'Buy': [open_values[i] for i in range(0, len(open_values), 2)], 'Sold': [open_values[i] for i in range(1, len(open_values), 2)]})

>>> total

|    |       Buy |     Sold |
|---:|----------:|---------:|
|  0 | 0.0562783 | 0.949236 |
|  1 | 0.574924  | 0.39462  |
|  2 | 0.574096  | 0.38943  |

It works under the assumption that the original df table ends with 0s and not with 1s, i.e. for each first 1 in a row, there must be at least one 0 afterwards.

The assumption makes sense since the objective is to take differences later.

If the last value is 1, it will produce ValueError: All arrays must be of the same length.

  • Related