Reformating a large dataset in python-CodePudding

I have a large df from the WRDS CRSP database on monthly basis, which I need to reshape/reformat to be able to work with it.

the df looks like this:

PERMNO	Date	Ticker	Prc
1000	19851231	ABC	-4.32
1000	19850101	ABC	-2.12
1000	19851231	DEF	-1.32
1000	19850101	DEF	-6.72
1000	19850102	DEF	0.32

I need the unique Tickers as the header and unqiue dates(dateformat) as the index with the PRC matching ticker and date. I tried to convert tickers and dates to a list and create a new df with that, but I fail to match the Prc with ticker and date. It has to look like this:

Date	ABC	DEF
1985/12/31	-4.32	-1.32
1985/01/01	-2.12	-6.72
1985/01/02	....	0.32
1985/01/03	....	....
1985/01/04	....	....

I tried to reshape it in a way I could use pd.merge, but nothing I tried did work. Does anyone have any idea how to merge it properly or maybe an even better idea?

CodePudding user response：

You need pandas pivot

df.pivot(index='Date',columns='Ticker',values='Prc')

Using sample data created as below

df = pd.DataFrame([
[19850101,'abc',1.0],[19850102,'abc',1.1],[19850101,'xyz',2.0],[19850102,'xyz',1.9]
],columns=['Date','Ticker','Prc'])

initial dataframe

       date ticker  prc
0  19850101    abc  1.0
1  19850102    abc  1.1
2  19850101    xyz  2.0
3  19850102    xyz  1.9

pivot output

    ticker    abc  xyz
    date
    19850101  1.0  2.0
    19850102  1.1  1.9

Pandas Pivot Docs

If you need to aggregate, say they are multiple prices for a certain date/ticket combination, you can aggregate

df.pivot_table(index='Date',columns='Ticker',values='Prc',aggfunc='max')