I want to compare 2 csv files that is file 1 and file 2 on basis of column. If column of file 1 matches with column of file 2 then remove that entire row of file 1.
Example of file 1
sr. no.,username,id
101,Berlin240,835070687
102,X_PSYCH_X,1271001789
103,xenoo369,570078204
104,xarat581,1665916522
105,xandy88,639040049
Example of file 2:
sr. no.,username,id
101,Berlin240,835070687
103,xenoo369,570078204
105,xandy88,639040049
Now comparing file2 and removing all rows in file 1 that matches with the column of file 1.
Now the file1 looks like this:
sr. no.,username,id
102,X_PSYCH_X,1271001789
104,xarat581,1665916522
CodePudding user response:
The following code is a python solution. First install pandas. You can probably accomplish this with the command
>>> pip install pandas
The code is ....
import pandas as pd
file1_path = r"D:\Development\Test\file1.csv"
file2_path = r"D:\Development\Test\file2.csv"
df1 = pd.read_csv(file1_path)
df2 = pd.read_csv(file2_path)
new_df = df1[~df1['id'].isin(df2['id'])]
print(new_df)
new_df.to_csv(file1_path) # saving difference in file1
To see the output open your file1. output:
sr. no. username id
102 X_PSYCH_X 1271001789
104 xarat581 1665916522
CodePudding user response:
solution:
import pandas as pd
df1 = pd.read_csv("df1.csv")
df2 = pd.read_csv("df2.csv")
print(df1)
print(df2)
df_diff = pd.concat([df1,df2]).drop_duplicates(keep=False)
print(df_diff)
df1
sr. no. username id
0 101 Berlin240 835070687
1 102 X_PSYCH_X 1271001789
2 103 xenoo369 570078204
3 104 xarat581 1665916522
4 105 xandy88 639040049
df2
sr. no. username id
0 101 Berlin240 835070687
1 103 xenoo369 570078204
2 105 xandy88 639040049
df_diff
sr. no. username id
1 102 X_PSYCH_X 1271001789
3 104 xarat581 1665916522