I have two df,
dataset2:
0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ... c11 c12 c13 c14 c15 c16 c17 c18 c19 c20
0 s1 5 4 4 5 4 4 4 4 4 ... 4 4 3 3 4 3 4 4 3 3
1 s2 3 4 3 4 4 5 3 5 3 ... 5 3 3 2 3 3 3 5 5 1
2 s3 4 4 5 5 4 4 4 4 4 ... 5 4 4 1 3 2 3 3 4 3
3 s4 5 5 5 1 5 5 5 5 1 ... 4 5 5 1 5 4 5 4 5 5
4 s5 5 5 5 5 5 5 4 5 2 ... 4 4 5 1 2 2 5 5 5 3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
74 s75 4 4 4 4 5 5 5 5 5 ... 5 5 4 2 5 4 4 5 5 4
75 s76 5 3 4 5 5 5 4 5 4 ... 5 4 4 4 4 3 3 4 5 4
76 s77 5 3 3 5 2 3 3 3 3 ... 3 3 5 5 3 3 5 3 5 3
77 s78 4 5 4 2 2 4 4 4 5 ... 5 5 3 3 4 2 4 5 5 2
78 s79 5 4 5 5 5 5 4 5 5 ... 5 5 4 2 5 3 4 5 5 4
df_combinec:
0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ... c11 c12 c13 c14 c15 c16 c17 c18 c19 c20
0 s80 5 5 5 6 4 3 4 3 2 ... 4 2 5 8 3 2 4 4 5 4
1 s81 5 4 4 5 3 4 5 4 3 ... 5 5 5 6 5 3 3 3 5 4
2 s82 4 4 4 6 5 4 4 5 6 ... 5 4 4 1 4 2 4 5 4 3
3 s83 5 4 4 5 5 5 2 4 4 ... 5 5 5 7 4 2 4 5 5 4
4 s84 3 2 5 4 5 5 4 5 5 ... 4 5 5 4 4 3 4 5 4 3
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
116 s196 4 4 4 5 5 4 5 5 4 ... 5 4 4 3 3 4 4 3 5 5
117 s197 5 5 4 5 5 5 4 5 4 ... 5 5 4 2 5 3 5 5 5 3
118 s198 5 5 4 6 4 4 5 4 2 ... 5 5 4 0 5 1 4 4 5 4
119 s199 5 3 3 4 4 5 5 5 5 ... 5 4 5 2 4 3 5 5 5 5
120 s200 5 4 4 4 3 5 2 5 3 ... 4 4 5 4 2 1 4 5 5 4
I try below code to combine these df, but it comes out many Nan.
dataset2.reset_index(drop=True)
df_combinec.reset_index(drop=True)
comb_data = pd.concat([dataset2,df_combinec], ignore_index=True)
df_combinec after reindex:
How to solve it?
CodePudding user response:
Your issue is likely caused by a single level MultiIndex in the second DataFrame.
Here is an example:
df = pd.DataFrame([[1, 2]], columns=['A', 'B'])
df2 = pd.DataFrame([[3, 4]], columns=pd.MultiIndex.from_arrays([['A', 'B']]))
pd.concat([df, df2])
# A B (A,) (B,)
# 0 1.0 2.0 NaN NaN
# 0 NaN NaN 3.0 4.0
You can solve the issue by flattening the MultiIndex to normal Index:
df2.columns = df2.columns.get_level_values(0)
pd.concat([df, df2])
# A B
# 0 1 2
# 0 3 4
CodePudding user response:
Possibly, your column names are not matching. Review output of dataset2.columns
and df_combinec.columns
.
You can also try numpy.concatenate()
; but make sure your column order is correct.
comb_data = pd.DataFrame(np.concatenate((dataset2.values, df_combinec.values), axis=0))
comb_data.columns = [ '0', 'c1', 'c2' ... 'c20' ]
CodePudding user response:
Based on the comment, it seems your df_combinec dont have same columns as your dataset2
by putting df_combinec.columns=dataset2.columns
before concat
can solve the problem
but I think it is better to check your dataframe input too if you read it from csv, better check them and make sure the first line always same, or maybe the encoding (I wonder about this)
Note: mozway's solution is better and safer if you have different order of columns on df_combinec