I have a file ua.csv which has 2 rows and another file pr.csv which has 4 rows. I would like to know what are those rows which are present in pr.csv and ua.csv doesn't. Need to have count of extra rows present in pr.csv in the output.
ua.csv
Name|Address|City|Country|Pincode
Jim Smith|123 Any Street|Boston|US|02134
Jane Lee|248 Another St.|Boston|US|02130
pr.csv
Name|Address|City|Country|Pincode
Jim Smith|123 Any Street|Boston|US|02134
Smoet|coffee shop|finland|Europe|3453335
Jane Lee|248 Another St.|Boston|US|02130
Jack|long street|malasiya|Asia|585858
Below is the expected output:
pr.csv has 2 rows extra
Name|Address|City|Country|Pincode
Smoet|coffee shop|finland|Europe|3453335
Jack|long street|malasiya|Asia|585858
CodePudding user response:
I guess you could use the set
datastructure:
ua_set = set()
pr_set = set()
# Code to populate the sets reading the csv files (use the `add` method of sets)
...
# Find the difference
diff = pr_set.difference(ua_set)
print(f"pr.csv has {len(diff)} rows extra")
# It would be better to not hardcode the name of the columns in the output
# but getting the info depends on the package you use to read csv files
print("Name|Address|City|Country|Pincode")
for row in diff:
print(row)
A better solution using the pandas
module:
import pandas as pd
df_ua = pd.read_csv("ua.scv") # Must modify path to ua.csv
df_pr = pd.read_csv("pr.csv") # Must modify path to pr.csv
df_diff = df_pr.merge(df_ua, how="outer", indicator=True).loc[lambda x: x["_merge"] == "left_only"].drop("_merge", axis=1)
print(f"pr.csv has {len(df_diff)} rows extra")
print(df_diff)
CodePudding user response:
import csv
ua_dic={}
with open('ua.csv') as ua:
data=csv.reader(ua,delimiter=',')
for i in data:
if str(i) not in ua_dic:
ua_dic[str(i)]=1
output=[]
with open('pr.csv') as pr:
data=csv.reader(pr,delimiter=',')
for j in data:
if str(j) not in ua_dic:
output.append(j)
print(output)