Home > other >  Reformating a large dataset in python
Reformating a large dataset in python

Time:05-11

I have a large df from the WRDS CRSP database on monthly basis, which I need to reshape/reformat to be able to work with it.

the df looks like this:

PERMNO Date Ticker Prc
1000 19851231 ABC -4.32
1000 19850101 ABC -2.12
1000 19851231 DEF -1.32
1000 19850101 DEF -6.72
1000 19850102 DEF 0.32

I need the unique Tickers as the header and unqiue dates(dateformat) as the index with the PRC matching ticker and date. I tried to convert tickers and dates to a list and create a new df with that, but I fail to match the Prc with ticker and date. It has to look like this:

Date ABC DEF
1985/12/31 -4.32 -1.32
1985/01/01 -2.12 -6.72
1985/01/02 .... 0.32
1985/01/03 .... ....
1985/01/04 .... ....

I tried to reshape it in a way I could use pd.merge, but nothing I tried did work. Does anyone have any idea how to merge it properly or maybe an even better idea?

CodePudding user response:

You need pandas pivot

df.pivot(index='Date',columns='Ticker',values='Prc')

Using sample data created as below

df = pd.DataFrame([
[19850101,'abc',1.0],[19850102,'abc',1.1],[19850101,'xyz',2.0],[19850102,'xyz',1.9]
],columns=['Date','Ticker','Prc'])

initial dataframe

       date ticker  prc
0  19850101    abc  1.0
1  19850102    abc  1.1
2  19850101    xyz  2.0
3  19850102    xyz  1.9

pivot output

    ticker    abc  xyz
    date
    19850101  1.0  2.0
    19850102  1.1  1.9

Pandas Pivot Docs

If you need to aggregate, say they are multiple prices for a certain date/ticket combination, you can aggregate

df.pivot_table(index='Date',columns='Ticker',values='Prc',aggfunc='max')
  • Related