Home > database >  Drop the rows till we reach certain string as column name and append with another dataframe after do
Drop the rows till we reach certain string as column name and append with another dataframe after do

Time:05-05

I have 2 dataframes

df1 = pd.DataFrame([["M","N","O"],["A","B","C"],["X","Y","Z"],[2,3,4],[1,2,3]])
0   1   2
M   N   O
A   B   C
X   Y   Z
2   3   4
1   2   3
df2 = pd.DataFrame([["P","Q","R","S"],["X","Z","W","Y"],[4,5,6,7],[7,8,9,3]])
0   1   2   3
P   Q   R   S
X   Z   W   Y
4   5   6   7
7   8   9   3

I want to read the 1st dataframe drop the rows till the row starts with X and make that row as column names, then read the 2nd dataframe again drop the rows till row starts with X then append it to the 1st dataframe. Repeat the process in loop because I have multiple such dataframes.

Expected Output:

df_out = pd.DataFrame([[2,3,4,0],[1,2,3,0],[4,7,5,6],[7,3,8,9]],columns=["X","Y","Z","W"])
X   Y   Z   W
2   3   4   0
1   2   3   0
4   7   5   6
7   3   8   9

How to do it?

CodePudding user response:

First test if value X exist in any row for all columns in shifted DataFrame for get all rows after match by DataFrame.cummax with DataFrame.any and set columns names by this row in DataFrame.set_axis, same solution use for another DataFrame, join by concat, replace missing values and for expected order add DataFrame.reindex with unon both columns names:

m1 = df1.shift().eq('X').cummax().any(axis=1)
cols1 = df1[df1.eq('X').any(axis=1)].to_numpy().tolist()

df11 = df1[m1].set_axis(cols1, axis=1)

m2 = df2.shift().eq('X').cummax().any(axis=1)
cols2 = df2[df2.eq('X').any(axis=1)].to_numpy().tolist()

df22 = df2[m2].set_axis(cols2, axis=1)

df = (pd.concat([df11, df22], ignore_index=True)
        .fillna(0)
        .reindex(df11.columns.union(df22.columns, sort=False), axis=1))
print (df)
   X  Y  Z  W
0  2  3  4  0
1  1  2  3  0
2  4  7  5  6
3  7  3  8  9

CodePudding user response:

This works,

shift = 0

for index in df.index:
  if df.iloc[index - 1, 0] == "X":
    X = df.iloc[index - 1, :].values
    break
  shift -= 1

df = df.shift(shift).dropna()
df.columns = X
df

Output -

X Y Z
0 2 3 4
1 1 2 3
  • Related