We have two csv files - new.csv
and old.csv
.
old.csv
contains with four rows:
abc done
xyz done
pqr done
rst pending
The new.csv
contains four new rows:
abc pending
xyz not_done
pqr pending
rst done
I need to use count two things without using pandas
- count1 = number of entries changed from done to pending = 2 (abc, pqr)
- count2 = number of entries changed from done to not_done = 1 (xyz)
CodePudding user response:
CASE 1: CSV Files are in the same order
Firstly import the two files into python lists:
oldcsv = []
with open("old.csv") as f:
for line in f:
oldcsv.append(line.strip().split(","))
newcsv = []
with open("new.csv") as f:
for line in f:
newcsv.append(line.strip().split(","))
Now you would simply iterate through both lists simultaneously, using zip(). I am assuming that both CSV files list the entries in the same order.
count1 = 0
count2 = 0
for oldentry, newentry in zip(oldcsv, newcsv):
assert(oldentry[0] == newentry[0]) # Throw error if entry names do not match
if oldentry[1] == "done":
if newentry[1] == "pending":
count1 = 1
elif newentry[1] == "not_done":
count2 = 1
CASE 2: CSV Files are in arbitrary order
Here, given you are going to be needing to look up entries by their names, I would use a dictionary rather than a list to store the old.csv data, mapping the entry names to their values:
# Load old.csv data into a dictionary mapping entry_name: entry_value
old_values = {}
with open("old.csv") as f:
for line in f:
old_entry = line.strip().split(",")
entry_name, old_entry_value = old_entry[0], old_entry[1]
old_values[entry_name] = old_entry_value
count1 = 0
count2 = 0
with open("new.csv") as f:
for line in f:
# For each entry in new_entry, look up the corresponding old entry in old_entries, and compare their values.
new_entry = line.strip().split(",")
entry_name, new_entry_value = new_entry[0], new_entry[1]
old_entry_value = old_values.get(entry_name) # Get the old value for this entry (will be None if there is no old entry)
# Essentially same code as before:
print(f"{entry_name}: old entry status is {old_entry_value} and new entry status is {new_entry_value}")
if old_entry_value == "done":
if new_entry_value == "pending":
print("Incrementing count1")
count1 = 1
elif new_entry_value == "not_done":
print("Incrementing count2")
count2 = 1
print(count1)
print(count2)
This should work, as long as the input data is properly formatted. I am assuming each .csv file has one entry per line, and each line begins with the entry name (e.g. "abc"), then a comma, then the entry value (e.g. "done","not_done").
CodePudding user response:
Here is a pure python straightforward implementation:
import csv
with open("old.csv") as old_fl:
with open("new.csv") as new_fl:
old = csv.reader(old_fl)
new = csv.reader(new_fl)
old_rows = [row for row in old]
new_rows = [row for row in new]
# see if this is really needed
assert len(old_rows) == len(new_rows)
n = len(old_rows)
# assume that left key is identical,
# and in the same order in both files
assert all(old_rows[i][0] == new_rows[i][0] for i in range(n))
# once the data is guaranteed to align,
# just count what you want
done_to_pending = [
f"row[{i}]( {old_rows[i][0]} )"
for i in range(n)
if old_rows[i][1] == "done" and new_rows[i][1] == "pending"
]
done_to_notdone = [
f"row[{i}]( {old_rows[i][0]} )"
for i in range(n)
if old_rows[i][1] == "done" and new_rows[i][1] == "not_done"
]
It uses the python native csv reader so you don't need to parse the csv yourself. Note that there are various assumptions (assert
statements) throughout the code - you might need to adjust the code to handle more cases.