Forgive me as I'm pretty new to ruby, but I have two CSVs and I want to compare two CSV files and delete the row from CSV1 if something from CSV2 matches a key in CSV1
CSVs (with lots of redactions) for example:
# csv1
xyz_id,first_nm,last_nm
100001,Person,one
200002,Person,two
300003,Person,three
400001,Person,four
500001,Person,five
#csv2
id,name,first_name,last_name,address .....
1234,person,wun
1235,person,tuh who
1236,person,tree
100001,Person,one
If there is something matching in csv2, delete the entire row from CSV1.
Here is what I've tried:
I saw in another question I could leverage the usage of delete_if
but I'm slightly confused on the implementation of it. However, this is my code:
csv1 = CSV.table('./csv/csv1.csv', headers: true)
csv2 = CSV.table('./csv/csv2.csv', headers: true)
csv1.delete_if do |csv1row|
csv2.each do |csv2row|
csv1row[:xyz_id] == csv2[:id]
end
end
I'm also not sure if this is working. Do I have to write it back to csv1
? or create a whole new CSV for this? Is it possible to write it back to csv1 if things have been deleted?
CodePudding user response:
You're in the right path. The Array#delete_if
method expects the block to return a boolean. Take this example from the Ruby documentation:
scores = [ 97, 42, 75 ]
scores.delete_if {|score| score < 80 } #=> [97]
In your case, you just have to change that Array#each
call to Array#any?
. That will delete rows from csv1
that have a matching ID in csv2
.
csv1.delete_if do |csv1row|
csv2.any? { |csv2row| csv1row[:xyz_id] == csv2[:id] }
end
CodePudding user response:
With the below code first it'll load both csv files to the variables i.e. csv1
and csv2
with headers. Once file has been loaded we fetched common ids available to both columns xyz_id
from csv1
and column id
from csv2
. Then deleted recode from csv1
if xyz_id
available in the common_ids
array.
csv1 = CSV.table('csv_1.csv', headers: true)
csv2 = CSV.table('csv_2.csv', headers: true)
common_ids = csv1[:xyz_id] & csv2[:id]
csv1.delete_if {|row| common_ids.include?(row[:xyz_id])}