Home > OS >  Python: how to concat two dataframes by keeping only constant information?
Python: how to concat two dataframes by keeping only constant information?

Time:10-22

I have two dataframes

df1 
     A    B    C    id    val
0    ar    2    3     1    3.2
1    ar    3    3     1    5.6
3    ar    0    3     2    7.8 
4    ar    4    3     2    9.2
5    ar    5    3     2    3.4

df2
     id   val
0     1   3.3
1     2   6.4

I want to add a row to df1 with the information of df2 and keeping the same value for the columns where the values are constant while nan for columns that have different values.

At the end I would like to have

df3 
     A    B    C    id    val
0   ar    2    3     1    3.2
1   ar    3    3     1    5.6
3   ar    0    3     2    7.8 
4   ar    4    3     2    9.2
5   ar    5    3     2    3.4
6   ar   nan   3     1    3.3
7   ar   nan   3     2    6.4

CodePudding user response:

One way would be to create a list of the columns from df1 that have only 1 value using np.unique(), create those in df2, and pd.concat() df1 and df2:

eq_cols = []
[eq_cols.append(col) for col in df1.columns if len(np.unique(df1[col]))==1]
df2[eq_cols] = df1[eq_cols]

>>> pd.concat([df1,df2],axis=0)
    A    B  C  id  val
0  ar  2.0  3   1  3.2
1  ar  3.0  3   1  5.6
2  ar  0.0  3   2  7.8
3  ar  4.0  3   2  9.2
4  ar  5.0  3   2  3.4
0  ar  NaN  3   1  3.3
1  ar  NaN  3   2  6.4

CodePudding user response:

You can compare first row by all rows for constant columns, so after concat add them to df2:

c = df1.iloc[[0], df1.eq(df1.iloc[0]).all().to_numpy()].squeeze()
print (c)
A    1
C    3
Name: 0, dtype: int64


df = pd.concat([df1, df2.assign(**c)], ignore_index=True)
print (df)
   A    B  C  id  val
0  1  2.0  3   1  3.2
1  1  3.0  3   1  5.6
2  1  0.0  3   2  7.8
3  1  4.0  3   2  9.2
4  1  5.0  3   2  3.4
5  1  NaN  3   1  3.3
6  1  NaN  3   2  6.4
  • Related