Home > front end >  Merge 2 dataset python (Closed)
Merge 2 dataset python (Closed)

Time:01-07

I have an issue merging 2 CSV into a new dataset. So far, I have merged them. Let's say I have csv1 and csv2. I created a 'df' from csv1 and I merged csv2 with a specific column. But here is my problem:

I need to create a new column in df (e.g. 'new_foo', because both CSV has the same column name, but different data, so it's not possible to merge the 2 foo columns) and relate the two columns from merged csv2 column X. This is because in csv1 the common column is X.

Current Input:

import pandas as pd

df = pd.read_csv(csv1)
csv2_df = pd.read_csv(csv2)
csv2_column_x=csv2_df['x']
new_foo=csv2_df['foo']
df['new_foo']
df = df.merge(csv2_column_x, how="outer").merge(new_foo, how="outer")

Current Output:

X Y foo new_foo
1 'bla1' 'qwer' 12
1 'bla2' 'aswd' 34
1 'bla3' 'asdf'
2 'bla4' 'azer'

Desired Output:

X Y foo new_foo
1 'bla1' 'qwer' 12
1 'bla2' 'aswd' 12
1 'bla3' 'asdf' 12
2 'bla4' 'azer' 34

This is my csv1:

X Y foo
1 'bla1' 'qwer'
1 'bla2' 'aswd'
1 'bla3' 'asdf'
2 'bla4' 'azer'

This is my csv2:

X foo
1 12
2 34

NOTE: The CSV files have different numbers of rows.

CodePudding user response:

You can use Pandas join method the dataframes on column X.

Code:

import pandas as pd

csv1_df = pd.read_csv("csv1.csv")
csv2_df = pd.read_csv("csv2.csv")
merged = csv1_df.join(csv2_df.set_index('X'), on='X', lsuffix='_caller', rsuffix='_other')
print(merged)

Output:

   X       Y foo_caller  foo_other
0  1  'bla1'     'qwer'         12
1  1  'bla2'     'aswd'         12
2  1  'bla3'     'asdf'         12
3  2  'bla4'     'azer'         34

csv1.csv:

X,Y,foo
1,'bla1','qwer'
1,'bla2','aswd'
1,'bla3','asdf'
2,'bla4','azer'

csv2.csv:

X,foo
1,12
2,34

References:

CodePudding user response:

I finally did what Arsho told me and it's working perfectly. Instead of using the .merge() method, I used the .join() method to concatenate the CSV as I desired.

  •  Tags:  
  • Related