Home > Software design >  How do I compare two CSVs and remove a row if a key matches in ruby?
How do I compare two CSVs and remove a row if a key matches in ruby?

Time:10-07

Forgive me as I'm pretty new to ruby, but I have two CSVs and I want to compare two CSV files and delete the row from CSV1 if something from CSV2 matches a key in CSV1

CSVs (with lots of redactions) for example:

# csv1

xyz_id,first_nm,last_nm
100001,Person,one
200002,Person,two
300003,Person,three
400001,Person,four
500001,Person,five
#csv2

id,name,first_name,last_name,address .....
1234,person,wun
1235,person,tuh who
1236,person,tree
100001,Person,one

If there is something matching in csv2, delete the entire row from CSV1.

Here is what I've tried:

I saw in another question I could leverage the usage of delete_if but I'm slightly confused on the implementation of it. However, this is my code:

csv1 = CSV.table('./csv/csv1.csv', headers: true)
csv2 = CSV.table('./csv/csv2.csv', headers: true)

csv1.delete_if do |csv1row|
    csv2.each do |csv2row|
        csv1row[:xyz_id] == csv2[:id]
    end
end

I'm also not sure if this is working. Do I have to write it back to csv1? or create a whole new CSV for this? Is it possible to write it back to csv1 if things have been deleted?

CodePudding user response:

You're in the right path. The Array#delete_if method expects the block to return a boolean. Take this example from the Ruby documentation:

scores = [ 97, 42, 75 ]
scores.delete_if {|score| score < 80 }   #=> [97]

In your case, you just have to change that Array#each call to Array#any?. That will delete rows from csv1 that have a matching ID in csv2.

csv1.delete_if do |csv1row|
  csv2.any? { |csv2row| csv1row[:xyz_id] == csv2[:id] }
end

CodePudding user response:

With the below code first it'll load both csv files to the variables i.e. csv1 and csv2 with headers. Once file has been loaded we fetched common ids available to both columns xyz_id from csv1 and column id from csv2. Then deleted recode from csv1 if xyz_id available in the common_ids array.

csv1 = CSV.table('csv_1.csv', headers: true)
csv2 = CSV.table('csv_2.csv', headers: true)
common_ids = csv1[:xyz_id] & csv2[:id]
csv1.delete_if {|row| common_ids.include?(row[:xyz_id])}
  • Related