So I have 2 dataframes df1
,hhinc8,owncar,longitude,latitude,category,salary
0,5.0,1.0,-82.99194508,40.04649963,LRLC,34
1,6.0,2.0,-82.99584706,40.03738548,LRHC,23
2,5.0,1.0,-82.99697268,40.02674247,LRLC,34
3,6.0,2.0,-82.99160441,40.03612997,LRHC,21
4,6.0,2.0,-82.994716,40.04691778,LRHC,20
5,6.0,2.0,-82.99793728,40.04385713,LRHC,19
6,5.0,2.0,-82.98986012,40.03789279,LRHC,31
7,7.0,2.0,-82.99602685,40.03216463,LRHC,12
8,6.0,1.0,-82.99362168,40.0372023,LRLC,17
.
.
.
df2
longitude,latitude,category,FSA
-83.0347849999999,39.945993,SPM,81
-82.957851,40.060118,SPM,93
-82.994303,40.013227,SPM,87
-82.8265609999999,39.9207779999999,SPM,84
-82.9984709999999,39.887616,SPM,92
.
.
.
Now I want to create a 3rd dataframe by appending columns longitude,latitude,category
of both the dataframes
like
df3
longitude,latitude,category
-82.99194508,40.04649963,LRLC
-82.99584706,40.03738548,LRHC
-82.99697268,40.02674247,LRLC
-82.99160441,40.03612997,LRHC
-82.994716,40.04691778,LRHC
-82.99793728,40.04385713,LRHC
-82.98986012,40.03789279,LRHC
-82.99602685,40.03216463,LRHC
-82.99362168,40.0372023,LRLC
-83.0347849999999,39.945993,SPM
-82.957851,40.060118,SPM
-82.994303,40.013227,SPM
-82.8265609999999,39.9207779999999,SPM
-82.9984709999999,39.887616,SPM
.
.
.
I tried the following
df = household[['longitude','latitude','category']].append(market[['longitude','latitude','category']].reset_index(drop=True))
print(df)
which gave me
longitude latitude category
0 -82.991945 40.046500 NaN
1 -82.995847 40.037385 NaN
2 -82.996973 40.026742 NaN
3 -82.991604 40.036130 NaN
4 -82.994716 40.046918 NaN
.. ... ... ...
205 -83.047653 39.932568 CSPM
206 -83.090366 39.928228 CSPM
207 -83.045070 39.988848 CSPM
208 -82.916023 39.946893 CSPM
209 -82.863217 40.062583 CSPM
and
df = household[['longitude','latitude','category']].reset_index(drop=True).merge(market[['longitude','latitude','category']].reset_index(drop=True),on="category",how="left")
print(df)
longitude_x latitude_x category longitude_y latitude_y
0 -82.991945 40.046500 NaN NaN NaN
1 -82.995847 40.037385 NaN NaN NaN
2 -82.996973 40.026742 NaN NaN NaN
3 -82.991604 40.036130 NaN NaN NaN
4 -82.994716 40.046918 NaN NaN NaN
... ... ... ... ... ...
30015 -83.051109 39.990667 NaN NaN NaN
30016 -83.044124 39.990816 NaN NaN NaN
30017 -83.045784 39.991388 NaN NaN NaN
30018 -82.992414 40.049367 NaN NaN NaN
30019 -83.008978 40.017892 NaN NaN NaN
[30020 rows x 5 columns]
and
df = pd.concat([household[['longitude','latitude','category']].reset_index(drop=True),market[['longitude','latitude','category']].reset_index(drop=True)],axis=0)
print(df)
longitude latitude category longitude latitude category
0 -82.991945 40.046500 NaN -83.034785 39.945993 SPM
1 -82.995847 40.037385 NaN -82.957851 40.060118 SPM
2 -82.996973 40.026742 NaN -82.994303 40.013227 SPM
3 -82.991604 40.036130 NaN -82.826561 39.920778 SPM
4 -82.994716 40.046918 NaN -82.998471 39.887616 SPM
... ... ... ... ... ... ...
30015 -83.051109 39.990667 NaN NaN NaN NaN
30016 -83.044124 39.990816 NaN NaN NaN NaN
30017 -83.045784 39.991388 NaN NaN NaN NaN
30018 -82.992414 40.049367 NaN NaN NaN NaN
30019 -82.998328 40.070140 NaN NaN NaN NaN
[30020 rows x 6 columns]
What am I doing wrong?
CodePudding user response:
Wrong axis:
df = pd.concat([household[['longitude', 'latitude', 'category']].reset_index(drop=True), market[['longitude', 'latitude', 'category']].reset_index(drop=True)], axis=0)