How to transform combinations of values in columns into individual columns?-CodePudding

I have a dataset (df), that looks like this:

Date	ID	County Name	State	State Name	Product Name	Type of Transaction	QTY
202105	10001	Los Angeles	CA	California	Shoes	Entry	630
202012	10002	Houston	TX	Texas	Keyboard	Exit	5493
202001	11684	Chicago	IL	Illionis	Phone	Disposal	220
202107	12005	New York	NY	New York	Phone	Entry	302
...	...	...	...	...	...	...	...
202111	14990	Orlando	FL	Florida	Shoes	Exit	201

For every county, there are multiple entries for different Products, types of transactions, and at different dates, but not all counties have the same number of entries and they don't follow the same dates.

I want to recreate this dataset, such that: 1 - All counties have the same start and end dates, and for those dates where the county does not record entries, I want this entry to be recorded as NaN. 2 - The product names and their types are their own columns.

Essentially, this is how the dataset needs to look:

Date	ID	County Name	State	State Name	Shoes, Entry	Shoes, Exit	Shoes, Disposal	Phones, Entry	Phones, Exit	Phones, Disposal	Keyboard, Entry	Keyboard, Exit	Keyboard, Disposal
202105	10001	Los Angeles	CA	California	594	694	5660	33299	1110	5659	4559	3223	56889
202012	10002	Houston	TX	Texas	3420	4439	549	2110	5669	2245	39294	3345	556
202001	11684	Chicago	IL	Illionis	55432	4439	329	21190	4320	455	34059	44556	5677
202107	12005	New York	NY	New York	34556	2204	4329	11193	22345	43221	1544	3467	22450
...	...	...	...	...	...	...	...	...	...	...	...	...	...
202111	14990	Orlando	FL	Florida	54543	23059	3290	21394	34335	59660	NaN	NaN	NaN

Under the example, you can see how Florida does not record certain transactions. I would like to add the NaN such that the dataframe looks like this. I appreciate all the help!

CodePudding user response：

This is essentially a pivot, with flattening of the MultiIndex:

(df
 .pivot(index=['Date', 'ID', 'County Name', 'State', 'State Name'],
        columns=['Product Name', 'Type of Transaction'],
        values='QTY')
 .pipe(lambda d: d.set_axis(map(','.join, d. columns), axis=1))
 .reset_index()
 )

Output:

     Date     ID  County Name State  State Name  Shoes,Entry  Keyboard,Exit  \
0  202001  11684      Chicago    IL    Illionis          NaN            NaN   
1  202012  10002      Houston    TX       Texas          NaN         5493.0   
2  202105  10001  Los Angeles    CA  California        630.0            NaN   
3  202107  12005     New York    NY    New York          NaN            NaN   

   Phone,Disposal  Phone,Entry  
0           220.0          NaN  
1             NaN          NaN  
2             NaN          NaN  
3             NaN        302.0