Home > Net >  Merge csv files by columns
Merge csv files by columns

Time:10-20

Hey I'm using python3 and I want to merge two csv files by columns, I did it right but I want the merge will look on only two digits after the numbers and merge by that.

For example if I have this two csv

df1:
X1     Y1     Z1
1.232  2.271  6
3.527  5.628  7

df2:
X2     Y2     Z2
1.231  2.275  6
3.526  5.624  7

The current solution won't merge by the columns X and Y because the third digit after the numbers are diffrent, and I want the merge commend to look only two digits after the numbers and merge by them.

Here is my current solution:

from pandas import merge    
df1 = merge(df1, df2, left_on=['X1', 'Y1'], right_on=['X2', 'Y2'])

So from thia solution the two data frames won't merge, and I want them to be merge and ignore the third digit after the number.

CodePudding user response:

Use np.floor for cut values after 2 digits:

df1 = pd.merge(df1.assign(x11 = np.floor(df1['X1'], 2),
                          y11 = np.floor(df1['Y1'], 2)), 
               df2.assign(x22 = np.floor(df1['X2'], 2),
                          y22 = np.floor(df1['Y2'], 2)), left_on=['x11', 'y11'], right_on=['x22', 'y22']

Or use Series.round if possible (values are rounded, not cut):

df1 = pd.merge(df1.assign(x11 = df1['X1'].round(2),
                          y11 = df1['Y1'].round(2)), 
               df2.assign(x22 = df1['X2'].round(2),
                          y22 = df1['Y2'].round(2)), left_on=['x11', 'y11'], right_on=['x22', 'y22'])

Or trick with multiple 100 and cast to integers for cut values:

df1 = pd.merge(df1.assign(x11 = df1['X1'].mul(100).astype(int),
                          y11 = df1['Y1'].mul(100).astype(int)), 
               df2.assign(x22 = df1['X2'].mul(100).astype(int)),
                          y22 = df1['Y2'].mul(100).astype(int)), left_on=['x11', 'y11'], right_on=['x22', 'y22'])
  • Related