I want to compare 2 csv files that is file 1 and file 2 on basis of column. If column of file 1 matches with column of file 2 then remove that entire row of file 1.

Example of file 1

sr. no.,username,id
101,Berlin240,835070687
102,X_PSYCH_X,1271001789 
103,xenoo369,570078204
104,xarat581,1665916522
105,xandy88,639040049

Example of file 2:

sr. no.,username,id
101,Berlin240,835070687
103,xenoo369,570078204
105,xandy88,639040049

Now comparing file2 and removing all rows in file 1 that matches with the column of file 1.

Now the file1 looks like this:

sr. no.,username,id
102,X_PSYCH_X,1271001789
104,xarat581,1665916522

CodePudding user response：

The following code is a python solution. First install pandas. You can probably accomplish this with the command

>>> pip install pandas

The code is ....

import pandas as pd

file1_path = r"D:\Development\Test\file1.csv"
file2_path = r"D:\Development\Test\file2.csv"

df1 = pd.read_csv(file1_path)
df2 = pd.read_csv(file2_path)

new_df = df1[~df1['id'].isin(df2['id'])]

print(new_df)

new_df.to_csv(file1_path)  # saving difference in file1

To see the output open your file1. output:

   sr. no.    username          id
     102     X_PSYCH_X    1271001789
     104     xarat581     1665916522

CodePudding user response：

solution:

import pandas as pd

df1 = pd.read_csv("df1.csv")
df2 = pd.read_csv("df2.csv")

print(df1)
print(df2)

df_diff = pd.concat([df1,df2]).drop_duplicates(keep=False)

print(df_diff)

df1

     sr. no.  username      id
0      101  Berlin240   835070687
1      102  X_PSYCH_X  1271001789
2      103   xenoo369   570078204
3      104   xarat581  1665916522
4      105    xandy88   639040049

df2

     sr. no.  username     id
0      101  Berlin240  835070687
1      103   xenoo369  570078204
2      105    xandy88  639040049

df_diff

     sr. no.  username      id
1      102  X_PSYCH_X  1271001789
3      104   xarat581  1665916522