I am confident I could muddle through this, and perhaps even land on something elegant, but I'm curious how others would approach this problem:
I would like to take a dataframe like this:
| a1 | b1 | c1 |
| ---- | ---- | --- |
| a2 | b2 | c2 |
| --- | --- | --- |
| v1 | v2 | v3 |
| ... | ... | ... |
| v100 | v101 | v102|
And convert to a dataframe like this, where x*
are custom headers that I can provide a list for:
| x1 | x2 | x3 |
| --- | --- | ---- |
| a1 | a2 | v1 |
| ... | ... | ... |
| a1 | a2 | v100 |
| b1 | b2 | v2 |
| ... | ... | ... |
| b1 | b2 | v101 |
| c1 | c2 | v3 |
| ... | ... | ... |
| c1 | c2 | v102 |
The context is a CSV import that needs to be pivoted, but has two rows of headers that will become x1
and x2
in this situation.
Any tips or suggestions much appreciated! Again, confident could do this with some elbow grease, but would like to improve my pivoting and index-ing intuition.
UPDATE: per comments, here are source and target dataframe examples:
# source dataframe
df1 = pd.DataFrame(columns=['a1','b1','c1'], data=[['a2','b2','c2'],['v1','v2','v3'],['v100','v101','v102']])
"""
In [14]: df1
Out[14]:
a1 b1 c1
0 a2 b2 c2
1 v1 v2 v3
2 v100 v101 v102
"""
# target dataframe (where "x*" headers will be provided)
df2 = pd.DataFrame(columns=['x1','x2','x3'], data=[['a1','a2','v1'],['a1','a2','v100'],['b1','b2','v2'],['b1','b2','v101'],['c1','c2','v3'],['c1','c2','v102']])
"""
In [16]: df2
Out[16]:
x1 x2 x3
0 a1 a2 v1
1 a1 a2 v100
2 b1 b2 v2
3 b1 b2 v101
4 c1 c2 v3
5 c1 c2 v102
"""
CodePudding user response:
If I understood your initial DataFrame correctly, I can create it this way:
import pandas as pd
n = 100
listA = [f'V{x}' for x in range(1,n 1)]
listA.insert(0,("a1"))
listA.insert(1,"a2")
listB = [f'V{x}' for x in range(n 1,2*n 1)]
listB.insert(0,("b1"))
listB.insert(1,"b2")
listC = [f'V{x}' for x in range(2*n 1,3*n 1)]
listC.insert(0,("c1"))
listC.insert(1,"c2")
data = [listA, listB, listC]
df = pd.DataFrame(data).T
Now you create an empty DataFrame with three columns: "x1", "x2", "x3" :
newDF = pd.DataFrame( columns=["x1", "x2", "x3"])
And you add the rows in a for loop choosing the location of elements in the original DataFrame:
for i in range(3):
for j in range(n):
row = [df.iloc[0][i],df.iloc[1][i], df.iloc[j 2][i]]
newDF.loc[len(newDF.index)] = row
Hoping to have been useful to you.
UPDATE
According to your Update :
You create a row0 from the columns header :
row0 = pd.DataFrame(list(df1.columns)).T
You change columns name of row0 and df1 :
row0.columns = ["x1","x2","x3"]
df1.columns = ["x1","x2","x3"]
You concatenate row0 and df1 in a dnew DataFrame:
dnew = pd.concat([row0, df])
You create df2 :
df2 = pd.DataFrame(columns=["x1", "x2", "x3"])
for i in range(len(dnew.columns)):
for j in range(len(dnew)-2):
row = [dnew.iloc[0][i],dnew.iloc[1][i], dnew.iloc[j 2][i]]
df2.loc[len(df2)] = row
CodePudding user response:
This is a multi-step reshaping:
(df1.T
.set_index(0, append=True)
.rename_axis(['x1', 'x2'])
.stack()
.droplevel(-1)
.reset_index(name='x3')
)
Output:
x1 x2 x3
0 a1 a2 v1
1 a1 a2 v100
2 b1 b2 v2
3 b1 b2 v101
4 c1 c2 v3
5 c1 c2 v102