We want to combine the two modified data frames into one data using the merge method. The shape of each data frame is 16598 rows × 6 columns. The result was expected to be (16598 rows × 6 columns). However, the combined result was (16602 rows × 7 columns), and the number of rows increased by four. The code I used is as follows.
total_data = pd.merge(data_01,data_02,on=['Name', 'Platform', 'Year', 'Genre', 'Publisher'])
To be more specific..
The column names of 'data_01' are 'Name', 'Platform', 'Year', 'Genre', 'Publisher', and 'NA_Sales'. (16598 rows × 6 columns)
The column names of 'data_02' are 'Name', 'Platform', 'Year', 'Genre', 'Publisher', and 'EU_Sales'. (16598 rows × 6 columns)
The two data frames differ only in index number and order of data rows, and the values of 'Name', 'Platform', 'Year', 'Genre', and 'Publisher' are the same.
Only values of "NA_Sales," "EU_Sales" and "Year" are numbers and the rest are types of objects.
What I want to make... I want to make a DataFrame(16598 rows × 7 columns) to combine Data01 and Data02. However, the column keeps increasing.
data_01 (16598 rows × 6 columns)
Name Platform Year Genre Publisher NA_Sales
1 Candace.. DS 2008.0 Action Destineer 40.0
2 The Mun.. Wii 2009.0 Action Namco.. 170.0
3 Otome .. PS 2010.0 Adventure Alchemist 0.0
4 Deal.. DS 2010.0 Misc Zoo Games 40.0
5 Ben 10.. PS3 2010.0 Platform D3Publisher 120.0
... ... ... ... ... ... ...
16331 Midway.. PS2 2003.0 Misc Midway Games 720000.0
16409 NASCAR.. PS2 2005.0 Racing Electronic.. 530000.0
16483 Super.. SAT 1998.0 Strategy Banpresto 0.0
16493 Morta.. PSV 2012.0 Fighting Warner Bros. 470000.0
16579 Gex:.. PS 1998.0 Platform BMG... 320000.0
data_02(16598 rows × 6 columns)
Name Platform Year Genre Publisher EU_Sales
1 Candace.. DS 2008.0 Action Destineer 0.0
2 The.. Wii 2009.0 Action Namco ... 0.0
3 Otome.. PSP 2010.0 Adventure Alchemi.. 0.0
4 Deal or.. DS 2010.0 Misc Zoo Games 0.0
5 Ben 10.. PS3 2010.0 Platform D3Publisher 90.0
... ... ... ... ... ... ...
16348 Aladdin.. Wii 2011.0 Racing Big.. 0.0
16375 Kill... XB 2003.0 Shooter Namco.. 50000.0
16385 Tomb.. PS2 2009.0 Action Eidos.. 40000.0
16526 Planet.. GBA 2001.0 Action Titus 0.0
16572 Koihime.. PS4 2016.0 Fighting Yeti 0.0
CodePudding user response:
I think I understand that data through Name
to Publisher
is the same in both tables index wise.
So just merge everything from one dataframe and one column from the other.
total_data = pd.merge(data_01, data_02.EU_Sales, left_index=True, right_index=True)