Home > Back-end >  Can't combine two df
Can't combine two df

Time:11-07

I have two df,

dataset2:

    0   c1  c2  c3  c4  c5  c6  c7  c8  c9  ... c11 c12 c13 c14 c15 c16 c17 c18 c19 c20
0   s1  5   4   4   5   4   4   4   4   4   ... 4   4   3   3   4   3   4   4   3   3
1   s2  3   4   3   4   4   5   3   5   3   ... 5   3   3   2   3   3   3   5   5   1
2   s3  4   4   5   5   4   4   4   4   4   ... 5   4   4   1   3   2   3   3   4   3
3   s4  5   5   5   1   5   5   5   5   1   ... 4   5   5   1   5   4   5   4   5   5
4   s5  5   5   5   5   5   5   4   5   2   ... 4   4   5   1   2   2   5   5   5   3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
74  s75 4   4   4   4   5   5   5   5   5   ... 5   5   4   2   5   4   4   5   5   4
75  s76 5   3   4   5   5   5   4   5   4   ... 5   4   4   4   4   3   3   4   5   4
76  s77 5   3   3   5   2   3   3   3   3   ... 3   3   5   5   3   3   5   3   5   3
77  s78 4   5   4   2   2   4   4   4   5   ... 5   5   3   3   4   2   4   5   5   2
78  s79 5   4   5   5   5   5   4   5   5   ... 5   5   4   2   5   3   4   5   5   4

df_combinec:

    0    c1 c2  c3  c4  c5  c6  c7  c8  c9  ... c11 c12 c13 c14 c15 c16 c17 c18 c19 c20
0   s80  5  5   5   6   4   3   4   3   2   ... 4   2   5   8   3   2   4   4   5   4
1   s81  5  4   4   5   3   4   5   4   3   ... 5   5   5   6   5   3   3   3   5   4
2   s82  4  4   4   6   5   4   4   5   6   ... 5   4   4   1   4   2   4   5   4   3
3   s83  5  4   4   5   5   5   2   4   4   ... 5   5   5   7   4   2   4   5   5   4
4   s84  3  2   5   4   5   5   4   5   5   ... 4   5   5   4   4   3   4   5   4   3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
116 s196 4  4   4   5   5   4   5   5   4   ... 5   4   4   3   3   4   4   3   5   5
117 s197 5  5   4   5   5   5   4   5   4   ... 5   5   4   2   5   3   5   5   5   3
118 s198 5  5   4   6   4   4   5   4   2   ... 5   5   4   0   5   1   4   4   5   4
119 s199 5  3   3   4   4   5   5   5   5   ... 5   4   5   2   4   3   5   5   5   5
120 s200 5  4   4   4   3   5   2   5   3   ... 4   4   5   4   2   1   4   5   5   4

I try below code to combine these df, but it comes out many Nan.

dataset2.reset_index(drop=True)
df_combinec.reset_index(drop=True)
comb_data = pd.concat([dataset2,df_combinec], ignore_index=True)

df_combinec after reindex:

enter image description here

comb_data: enter image description here

How to solve it?

CodePudding user response:

Your issue is likely caused by a single level MultiIndex in the second DataFrame.

Here is an example:

df = pd.DataFrame([[1, 2]], columns=['A', 'B'])
df2 = pd.DataFrame([[3, 4]], columns=pd.MultiIndex.from_arrays([['A', 'B']]))

pd.concat([df, df2])

#      A    B  (A,)  (B,)
# 0  1.0  2.0   NaN   NaN
# 0  NaN  NaN   3.0   4.0

You can solve the issue by flattening the MultiIndex to normal Index:

df2.columns = df2.columns.get_level_values(0)
pd.concat([df, df2])

#    A  B
# 0  1  2
# 0  3  4

CodePudding user response:

Possibly, your column names are not matching. Review output of dataset2.columns and df_combinec.columns.

You can also try numpy.concatenate(); but make sure your column order is correct.

comb_data = pd.DataFrame(np.concatenate((dataset2.values, df_combinec.values), axis=0))
comb_data.columns = [ '0', 'c1', 'c2' ... 'c20' ]

CodePudding user response:

Based on the comment, it seems your df_combinec dont have same columns as your dataset2

by putting df_combinec.columns=dataset2.columns before concat can solve the problem

but I think it is better to check your dataframe input too if you read it from csv, better check them and make sure the first line always same, or maybe the encoding (I wonder about this)

Note: mozway's solution is better and safer if you have different order of columns on df_combinec

  • Related