Home > Back-end >  pivot wide dataframe with double row heading to deep dataframe
pivot wide dataframe with double row heading to deep dataframe

Time:05-29

I am confident I could muddle through this, and perhaps even land on something elegant, but I'm curious how others would approach this problem:

I would like to take a dataframe like this:

| a1   | b1   | c1  |
| ---- | ---- | --- |
| a2   | b2   | c2  |
| ---  | ---  | --- |
| v1   | v2   | v3  |
| ...  | ...  | ... |
| v100 | v101 | v102|

And convert to a dataframe like this, where x* are custom headers that I can provide a list for:

| x1  | x2  | x3   |
| --- | --- | ---- |
| a1  | a2  | v1   |
| ... | ... | ...  |
| a1  | a2  | v100 |
| b1  | b2  | v2   |
| ... | ... | ...  |
| b1  | b2  | v101 |
| c1  | c2  | v3   |
| ... | ... | ...  |
| c1  | c2  | v102 |

The context is a CSV import that needs to be pivoted, but has two rows of headers that will become x1 and x2 in this situation.

Any tips or suggestions much appreciated! Again, confident could do this with some elbow grease, but would like to improve my pivoting and index-ing intuition.

UPDATE: per comments, here are source and target dataframe examples:

# source dataframe
df1 = pd.DataFrame(columns=['a1','b1','c1'], data=[['a2','b2','c2'],['v1','v2','v3'],['v100','v101','v102']])

"""
In [14]: df1                                                                                                                                                    
Out[14]: 
     a1    b1    c1
0    a2    b2    c2
1    v1    v2    v3
2  v100  v101  v102
"""

# target dataframe (where "x*" headers will be provided)
df2 = pd.DataFrame(columns=['x1','x2','x3'], data=[['a1','a2','v1'],['a1','a2','v100'],['b1','b2','v2'],['b1','b2','v101'],['c1','c2','v3'],['c1','c2','v102']])

"""
In [16]: df2                                                                                                                                                    
Out[16]: 
   x1  x2    x3
0  a1  a2    v1
1  a1  a2  v100
2  b1  b2    v2
3  b1  b2  v101
4  c1  c2    v3
5  c1  c2  v102
"""

CodePudding user response:

If I understood your initial DataFrame correctly, I can create it this way:

import pandas as pd
n = 100
listA = [f'V{x}' for x in range(1,n 1)]
listA.insert(0,("a1"))
listA.insert(1,"a2")
listB = [f'V{x}' for x in range(n 1,2*n 1)]
listB.insert(0,("b1"))
listB.insert(1,"b2")
listC = [f'V{x}' for x in range(2*n 1,3*n 1)]
listC.insert(0,("c1"))
listC.insert(1,"c2")
data = [listA, listB, listC]
df = pd.DataFrame(data).T

Now you create an empty DataFrame with three columns: "x1", "x2", "x3" :

newDF = pd.DataFrame( columns=["x1", "x2", "x3"])

And you add the rows in a for loop choosing the location of elements in the original DataFrame:

for i in range(3):
    for j in range(n):
        row = [df.iloc[0][i],df.iloc[1][i], df.iloc[j 2][i]]
        newDF.loc[len(newDF.index)] = row

Hoping to have been useful to you.

UPDATE
According to your Update :
You create a row0 from the columns header :

row0 = pd.DataFrame(list(df1.columns)).T

You change columns name of row0 and df1 :

row0.columns = ["x1","x2","x3"]
df1.columns = ["x1","x2","x3"]

You concatenate row0 and df1 in a dnew DataFrame:

dnew = pd.concat([row0, df])

You create df2 :

df2 = pd.DataFrame(columns=["x1", "x2", "x3"])
for i in range(len(dnew.columns)):
    for j in range(len(dnew)-2):
        row = [dnew.iloc[0][i],dnew.iloc[1][i], dnew.iloc[j 2][i]]
        df2.loc[len(df2)] = row

CodePudding user response:

This is a multi-step reshaping:

(df1.T
 .set_index(0, append=True)
 .rename_axis(['x1', 'x2'])
 .stack()
 .droplevel(-1)
 .reset_index(name='x3')
)

Output:

   x1  x2    x3
0  a1  a2    v1
1  a1  a2  v100
2  b1  b2    v2
3  b1  b2  v101
4  c1  c2    v3
5  c1  c2  v102
  • Related