Home > Blockchain >  Appending 2 dataframes is giving NAN in the common column
Appending 2 dataframes is giving NAN in the common column

Time:10-29

So I have 2 dataframes df1

,hhinc8,owncar,longitude,latitude,category,salary
0,5.0,1.0,-82.99194508,40.04649963,LRLC,34
1,6.0,2.0,-82.99584706,40.03738548,LRHC,23
2,5.0,1.0,-82.99697268,40.02674247,LRLC,34
3,6.0,2.0,-82.99160441,40.03612997,LRHC,21
4,6.0,2.0,-82.994716,40.04691778,LRHC,20
5,6.0,2.0,-82.99793728,40.04385713,LRHC,19
6,5.0,2.0,-82.98986012,40.03789279,LRHC,31
7,7.0,2.0,-82.99602685,40.03216463,LRHC,12
8,6.0,1.0,-82.99362168,40.0372023,LRLC,17
.
.
.

df2

longitude,latitude,category,FSA
-83.0347849999999,39.945993,SPM,81
-82.957851,40.060118,SPM,93
-82.994303,40.013227,SPM,87
-82.8265609999999,39.9207779999999,SPM,84
-82.9984709999999,39.887616,SPM,92
.
.
.

Now I want to create a 3rd dataframe by appending columns longitude,latitude,category of both the dataframes like df3

longitude,latitude,category
-82.99194508,40.04649963,LRLC
-82.99584706,40.03738548,LRHC
-82.99697268,40.02674247,LRLC
-82.99160441,40.03612997,LRHC
-82.994716,40.04691778,LRHC
-82.99793728,40.04385713,LRHC
-82.98986012,40.03789279,LRHC
-82.99602685,40.03216463,LRHC
-82.99362168,40.0372023,LRLC
-83.0347849999999,39.945993,SPM
-82.957851,40.060118,SPM
-82.994303,40.013227,SPM
-82.8265609999999,39.9207779999999,SPM
-82.9984709999999,39.887616,SPM
.
.
.

I tried the following

df = household[['longitude','latitude','category']].append(market[['longitude','latitude','category']].reset_index(drop=True))
print(df)

which gave me

     longitude   latitude category
0   -82.991945  40.046500      NaN
1   -82.995847  40.037385      NaN
2   -82.996973  40.026742      NaN
3   -82.991604  40.036130      NaN
4   -82.994716  40.046918      NaN
..         ...        ...      ...
205 -83.047653  39.932568     CSPM
206 -83.090366  39.928228     CSPM
207 -83.045070  39.988848     CSPM
208 -82.916023  39.946893     CSPM
209 -82.863217  40.062583     CSPM

and

df = household[['longitude','latitude','category']].reset_index(drop=True).merge(market[['longitude','latitude','category']].reset_index(drop=True),on="category",how="left")
print(df)
       longitude_x  latitude_x category  longitude_y  latitude_y
0       -82.991945   40.046500      NaN          NaN         NaN
1       -82.995847   40.037385      NaN          NaN         NaN
2       -82.996973   40.026742      NaN          NaN         NaN
3       -82.991604   40.036130      NaN          NaN         NaN
4       -82.994716   40.046918      NaN          NaN         NaN
...            ...         ...      ...          ...         ...
30015   -83.051109   39.990667      NaN          NaN         NaN
30016   -83.044124   39.990816      NaN          NaN         NaN
30017   -83.045784   39.991388      NaN          NaN         NaN
30018   -82.992414   40.049367      NaN          NaN         NaN
30019   -83.008978   40.017892      NaN          NaN         NaN

[30020 rows x 5 columns]

and

df = pd.concat([household[['longitude','latitude','category']].reset_index(drop=True),market[['longitude','latitude','category']].reset_index(drop=True)],axis=0)
print(df)
       longitude   latitude category  longitude   latitude category
0     -82.991945  40.046500      NaN -83.034785  39.945993      SPM
1     -82.995847  40.037385      NaN -82.957851  40.060118      SPM
2     -82.996973  40.026742      NaN -82.994303  40.013227      SPM
3     -82.991604  40.036130      NaN -82.826561  39.920778      SPM
4     -82.994716  40.046918      NaN -82.998471  39.887616      SPM
...          ...        ...      ...        ...        ...      ...
30015 -83.051109  39.990667      NaN        NaN        NaN      NaN
30016 -83.044124  39.990816      NaN        NaN        NaN      NaN
30017 -83.045784  39.991388      NaN        NaN        NaN      NaN
30018 -82.992414  40.049367      NaN        NaN        NaN      NaN
30019 -82.998328  40.070140      NaN        NaN        NaN      NaN

[30020 rows x 6 columns]

What am I doing wrong?

CodePudding user response:

Wrong axis:

df = pd.concat([household[['longitude', 'latitude', 'category']].reset_index(drop=True), market[['longitude', 'latitude', 'category']].reset_index(drop=True)], axis=0)
  • Related