Home > Software design >  Reformatting DataFrame so that alternating rows are in same row
Reformatting DataFrame so that alternating rows are in same row

Time:11-30

I have a dataframe that lists trading activity in subsequent rows. So row 1 is the information for the buy, and row 2 is the information for the sell, and so on.

Ideally I'd like to see the info for each buy/sell pair in the same row. Normally I'd use pivot_table or pivot to do something like this, but the problem is that the rows I want to pivot are alternating.

Here's an example for a dataframe that I have:

values = pd.DataFrame({
    'Date': ['2010-01-03', '2011-02-06', '2012-03-06', '2012-05-15', '2013-04-12', '2014-03-18'],
    'Signal': [1, -1, 1, -1, 1, -1],
    'Close': [100, 103, 106, 108, 111, 114]
})

Which looks like this:

enter image description here

However, here's an example of what I would like:

new_data = pd.DataFrame({
    'Trade': [1, 2, 3],
    'Begin Date': ['2010-01-03', '2012-03-06', '2013-04-12'],
    'End Date': ['2011-02-06', '2012-05-15', '2014-03-18'],
    'Trade Open': [100, 106, 111],
    'Trade Close': [103, 108, 114]
})

Which looks like this:

enter image description here

I know there's a manual way to do this by looping through each row, but am wondering if there's a more succinct, vectorized way to do it instead.

CodePudding user response:

Try:

  1. Assign an "idx" corresponding to the trade number for every 2 rows
  2. Assign the "action" based on the "Signal"
  3. pivot to get the required structure
  4. rename columns to desired values
values["idx"] = values.index // 2   1
values["action"] = np.where(values["Signal"].eq(1), "Open", "Close")
output = values.rename(columns={"Close": "Trade"}).pivot("idx", "action", ["Date", "Trade"])
output.columns = [" ".join(col) for col in output.columns.values]
output = output.reset_index().rename(columns={"idx": "Trade", 
                                              "Date Close": "End Date",
                                              "Date Open": "Begin Date"})

>>> output
   Trade    End Date  Begin Date Trade Close Trade Open
0      1  2011-02-06  2010-01-03         103        100
1      2  2012-05-15  2012-03-06         108        106
2      3  2014-03-18  2013-04-12         114        111

CodePudding user response:

values = values[values.Signal.eq(1)].reset_index(drop=True).join(values[values.Signal.eq(-1)].reset_index(drop=True),rsuffix="_").assign(Trade=lambda x: x.index 1)[['Trade','Date','Date_','Close','Close_']]
values.columns = ['Trade','Begin Date','End Date','Trade Open','Trade Close']
print(values)
   Trade  Begin Date    End Date  Trade Open  Trade Close
0      1  2010-01-03  2011-02-06         100          103
1      2  2012-03-06  2012-05-15         106          108
2      3  2013-04-12  2014-03-18         111          114
  • Related