I am trying to stack the values of several dataframes into one single dataframe. They all have 65 columns but with different names. For instance, lets say that I have two df1 and df2
df1 = pd.DataFrame(
{"A": ["A0", "A1", "A2", "A3"],
"B": ["B0", "B1", "B2", "B3"],
"C": ["C0", "C1", "C2", "C3"],
"D": ["D0", "D1", "D2", "D3"], })
df2 = pd.DataFrame(
{"F": ["A0", "A1", "A2", "A3"],
"G": ["B0", "B1", "B2", "B3"],
"K": ["C0", "C1", "C2", "C3"],
"Z": ["D0", "D1", "D2", "D3"], })
What I want is to concatenate rows using the position of the columns as keys, and not the column names. So A has the first position in df1, and F has the first position in df2. I want to stack the values of columns A and F together given that they are both in the first position of their data frames.
I want to do the same for every column: stack B in df1 with G in df2 and so on. The result would be something like this:
A0 B0 C0 D0
A1 B1 C1 D1
A2 B2 C2 D2
A3 B3 C3 D3
A0 B0 C0 D0
A1 B1 C1 D1
A2 B2 C2 D2
A3 B3 C3 D3
And how could I pass this instruction so that multiple datasets follow the same instruction with some type of for loop? I have been looking at the documentation for the merge(), concat(), and join() functions but none seem to do what I want.
CodePudding user response:
You can concat
after changing the df2
column names to match the df1
columns by position
df1 = pd.DataFrame(
{"A": ["A00", "A01", "A02", "A03"],
"B": ["B00", "B01", "B02", "B03"],
"C": ["C00", "C01", "C02", "C03"],
"D": ["D00", "D01", "D02", "D03"], })
df2 = pd.DataFrame(
{"F": ["A10", "A11", "A12", "A13"],
"G": ["B10", "B11", "B12", "B13"],
"K": ["C10", "C11", "C12", "C13"],
"Z": ["D10", "D11", "D12", "D13"], })
df3 = pd.DataFrame(
{"Q1": ["A20", "A21", "A22", "A23"],
"Q2": ["B20", "B21", "B22", "B23"],
"Q3": ["C20", "C21", "C22", "C23"],
"Q4": ["D20", "D21", "D22", "D23"], })
pd.concat(
(
df1,
df2.set_axis(df1.columns, axis=1),
df3.set_axis(df1.columns, axis=1),
),
ignore_index=True,
)
Output
CodePudding user response:
You can use np.concatenate()
or np.r_
to combine multiple df's
df_list = [df1,df2]
np.concatenate(df_list)
or
np.r_[df1,df2]
Then pass either one into pd.DataFrame()
For example:
pd.DataFrame(np.concatenate(df_list),columns = df1.columns)
Output:
A B C D
0 A0 B0 C0 D0
1 A1 B1 C1 D1
2 A2 B2 C2 D2
3 A3 B3 C3 D3
4 A0 B0 C0 D0
5 A1 B1 C1 D1
6 A2 B2 C2 D2
7 A3 B3 C3 D3