I have a large df from the WRDS CRSP database on monthly basis, which I need to reshape/reformat to be able to work with it.
the df looks like this:
PERMNO | Date | Ticker | Prc |
---|---|---|---|
1000 | 19851231 | ABC | -4.32 |
1000 | 19850101 | ABC | -2.12 |
1000 | 19851231 | DEF | -1.32 |
1000 | 19850101 | DEF | -6.72 |
1000 | 19850102 | DEF | 0.32 |
I need the unique Tickers as the header and unqiue dates(dateformat) as the index with the PRC matching ticker and date. I tried to convert tickers and dates to a list and create a new df with that, but I fail to match the Prc with ticker and date. It has to look like this:
Date | ABC | DEF |
---|---|---|
1985/12/31 | -4.32 | -1.32 |
1985/01/01 | -2.12 | -6.72 |
1985/01/02 | .... | 0.32 |
1985/01/03 | .... | .... |
1985/01/04 | .... | .... |
I tried to reshape it in a way I could use pd.merge
, but nothing I tried did work.
Does anyone have any idea how to merge it properly or maybe an even better idea?
CodePudding user response:
You need pandas pivot
df.pivot(index='Date',columns='Ticker',values='Prc')
Using sample data created as below
df = pd.DataFrame([
[19850101,'abc',1.0],[19850102,'abc',1.1],[19850101,'xyz',2.0],[19850102,'xyz',1.9]
],columns=['Date','Ticker','Prc'])
initial dataframe
date ticker prc
0 19850101 abc 1.0
1 19850102 abc 1.1
2 19850101 xyz 2.0
3 19850102 xyz 1.9
pivot output
ticker abc xyz
date
19850101 1.0 2.0
19850102 1.1 1.9
If you need to aggregate, say they are multiple prices for a certain date/ticket combination, you can aggregate
df.pivot_table(index='Date',columns='Ticker',values='Prc',aggfunc='max')