Home > Software engineering >  The problem is increased the row when I try to merge data that have the same row size
The problem is increased the row when I try to merge data that have the same row size

Time:10-08

We want to combine the two modified data frames into one data using the merge method. The shape of each data frame is 16598 rows × 6 columns. The result was expected to be (16598 rows × 6 columns). However, the combined result was (16602 rows × 7 columns), and the number of rows increased by four. The code I used is as follows.

total_data = pd.merge(data_01,data_02,on=['Name',   'Platform', 'Year', 'Genre', 'Publisher'])

To be more specific..

The column names of 'data_01' are 'Name', 'Platform', 'Year', 'Genre', 'Publisher', and 'NA_Sales'. (16598 rows × 6 columns)

The column names of 'data_02' are 'Name', 'Platform', 'Year', 'Genre', 'Publisher', and 'EU_Sales'. (16598 rows × 6 columns)

The two data frames differ only in index number and order of data rows, and the values of 'Name', 'Platform', 'Year', 'Genre', and 'Publisher' are the same.

Only values of "NA_Sales," "EU_Sales" and "Year" are numbers and the rest are types of objects.


What I want to make... I want to make a DataFrame(16598 rows × 7 columns) to combine Data01 and Data02. However, the column keeps increasing.


data_01 (16598 rows × 6 columns)

        Name      Platform      Year    Genre      Publisher    NA_Sales
1       Candace..    DS        2008.0   Action     Destineer     40.0
2       The Mun..    Wii       2009.0   Action     Namco..       170.0
3       Otome ..     PS        2010.0   Adventure  Alchemist     0.0
4       Deal..       DS        2010.0   Misc       Zoo Games     40.0
5       Ben 10..     PS3       2010.0   Platform   D3Publisher   120.0
... ... ... ... ... ... ...
16331   Midway..     PS2       2003.0   Misc      Midway Games   720000.0
16409   NASCAR..     PS2       2005.0   Racing    Electronic..   530000.0
16483   Super..      SAT       1998.0   Strategy  Banpresto      0.0
16493   Morta..      PSV       2012.0   Fighting  Warner Bros.   470000.0
16579   Gex:..       PS        1998.0   Platform    BMG...       320000.0

data_02(16598 rows × 6 columns)

       Name     Platform      Year    Genre      Publisher   EU_Sales
1     Candace..   DS         2008.0   Action     Destineer     0.0
2     The..       Wii        2009.0   Action     Namco ...     0.0
3     Otome..     PSP        2010.0   Adventure  Alchemi..     0.0
4     Deal or..   DS         2010.0   Misc       Zoo Games     0.0
5     Ben 10..    PS3        2010.0   Platform   D3Publisher   90.0
... ... ... ... ... ... ...
16348  Aladdin..  Wii        2011.0   Racing     Big..         0.0
16375   Kill...   XB         2003.0   Shooter   Namco..        50000.0
16385   Tomb..    PS2        2009.0   Action    Eidos..        40000.0
16526   Planet..  GBA        2001.0   Action    Titus          0.0
16572   Koihime.. PS4        2016.0   Fighting  Yeti           0.0

CodePudding user response:

I think I understand that data through Name to Publisher is the same in both tables index wise.

So just merge everything from one dataframe and one column from the other.

total_data = pd.merge(data_01, data_02.EU_Sales, left_index=True, right_index=True)
  • Related