Home > Blockchain >  How to stack observations from different dataframes by the positions of their columns and not the na
How to stack observations from different dataframes by the positions of their columns and not the na

Time:08-30

I am trying to stack the values of several dataframes into one single dataframe. They all have 65 columns but with different names. For instance, lets say that I have two df1 and df2

df1 = pd.DataFrame(
        {"A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"], })

df2 = pd.DataFrame(
        {"F": ["A0", "A1", "A2", "A3"],
        "G": ["B0", "B1", "B2", "B3"],
        "K": ["C0", "C1", "C2", "C3"],
        "Z": ["D0", "D1", "D2", "D3"], })

What I want is to concatenate rows using the position of the columns as keys, and not the column names. So A has the first position in df1, and F has the first position in df2. I want to stack the values of columns A and F together given that they are both in the first position of their data frames.

I want to do the same for every column: stack B in df1 with G in df2 and so on. The result would be something like this:


A0 B0 C0 D0
A1 B1 C1 D1
A2 B2 C2 D2
A3 B3 C3 D3
A0 B0 C0 D0
A1 B1 C1 D1
A2 B2 C2 D2
A3 B3 C3 D3

And how could I pass this instruction so that multiple datasets follow the same instruction with some type of for loop? I have been looking at the documentation for the merge(), concat(), and join() functions but none seem to do what I want.

CodePudding user response:

You can concat after changing the df2 column names to match the df1 columns by position

df1 = pd.DataFrame(
        {"A": ["A00", "A01", "A02", "A03"],
        "B": ["B00", "B01", "B02", "B03"],
        "C": ["C00", "C01", "C02", "C03"],
        "D": ["D00", "D01", "D02", "D03"], })

df2 = pd.DataFrame(
        {"F": ["A10", "A11", "A12", "A13"],
        "G": ["B10", "B11", "B12", "B13"],
        "K": ["C10", "C11", "C12", "C13"],
        "Z": ["D10", "D11", "D12", "D13"], })

df3 = pd.DataFrame(
        {"Q1": ["A20", "A21", "A22", "A23"],
        "Q2": ["B20", "B21", "B22", "B23"],
        "Q3": ["C20", "C21", "C22", "C23"],
        "Q4": ["D20", "D21", "D22", "D23"], })


pd.concat(
    (
        df1,
        df2.set_axis(df1.columns, axis=1),
        df3.set_axis(df1.columns, axis=1),
    ),
    ignore_index=True,
)

Output

enter image description here

CodePudding user response:

You can use np.concatenate() or np.r_ to combine multiple df's

df_list = [df1,df2]
np.concatenate(df_list)

or

np.r_[df1,df2]

Then pass either one into pd.DataFrame() For example:

pd.DataFrame(np.concatenate(df_list),columns = df1.columns)

Output:

    A   B   C   D
0  A0  B0  C0  D0
1  A1  B1  C1  D1
2  A2  B2  C2  D2
3  A3  B3  C3  D3
4  A0  B0  C0  D0
5  A1  B1  C1  D1
6  A2  B2  C2  D2
7  A3  B3  C3  D3
  • Related