I have a dataframe that lists trading activity in subsequent rows. So row 1 is the information for the buy, and row 2 is the information for the sell, and so on.
Ideally I'd like to see the info for each buy/sell pair in the same row. Normally I'd use pivot_table
or pivot
to do something like this, but the problem is that the rows I want to pivot are alternating.
Here's an example for a dataframe that I have:
values = pd.DataFrame({
'Date': ['2010-01-03', '2011-02-06', '2012-03-06', '2012-05-15', '2013-04-12', '2014-03-18'],
'Signal': [1, -1, 1, -1, 1, -1],
'Close': [100, 103, 106, 108, 111, 114]
})
Which looks like this:
However, here's an example of what I would like:
new_data = pd.DataFrame({
'Trade': [1, 2, 3],
'Begin Date': ['2010-01-03', '2012-03-06', '2013-04-12'],
'End Date': ['2011-02-06', '2012-05-15', '2014-03-18'],
'Trade Open': [100, 106, 111],
'Trade Close': [103, 108, 114]
})
Which looks like this:
I know there's a manual way to do this by looping through each row, but am wondering if there's a more succinct, vectorized way to do it instead.
CodePudding user response:
Try:
- Assign an "idx" corresponding to the trade number for every 2 rows
- Assign the "action" based on the "Signal"
pivot
to get the required structurerename
columns to desired values
values["idx"] = values.index // 2 1
values["action"] = np.where(values["Signal"].eq(1), "Open", "Close")
output = values.rename(columns={"Close": "Trade"}).pivot("idx", "action", ["Date", "Trade"])
output.columns = [" ".join(col) for col in output.columns.values]
output = output.reset_index().rename(columns={"idx": "Trade",
"Date Close": "End Date",
"Date Open": "Begin Date"})
>>> output
Trade End Date Begin Date Trade Close Trade Open
0 1 2011-02-06 2010-01-03 103 100
1 2 2012-05-15 2012-03-06 108 106
2 3 2014-03-18 2013-04-12 114 111
CodePudding user response:
values = values[values.Signal.eq(1)].reset_index(drop=True).join(values[values.Signal.eq(-1)].reset_index(drop=True),rsuffix="_").assign(Trade=lambda x: x.index 1)[['Trade','Date','Date_','Close','Close_']]
values.columns = ['Trade','Begin Date','End Date','Trade Open','Trade Close']
print(values)
Trade Begin Date End Date Trade Open Trade Close
0 1 2010-01-03 2011-02-06 100 103
1 2 2012-03-06 2012-05-15 106 108
2 3 2013-04-12 2014-03-18 111 114