Home > Software design >  Writing data into a CSV file by two different CSV files
Writing data into a CSV file by two different CSV files

Time:12-08

So, i'm learning ruby and i've been stuck with this for a long time and i need some help.

I need to write to a CSV file from two different CSV files and i have the code to do it but in 2 different functions and i need the two files together in one.

So thats the code:

require 'CSV'

class Plantas <
   Struct.new( :code)
end

class Especies <
   Struct.new(:id, :type, :code, :name_es, :name_ca, :name_en, :latin_name, :customer_id )
end

def ecode

   f_inECODE = File.open("pflname.csv", "r")                  #get EPPOCODE
   f_out=CSV.open("plantas.csv", "w ", :headers => true) #outputfile

   f_inECODE.each_line do |line|

   fields = line.split(',')

   newPlant = Plantas.new

   newPlant.code = fields[2].tr_s('"', '').strip #eppocode

       plant = [newPlant.code] #linies a imprimir
       f_out <<  plant

   end
end

def data

   f_dataspices=File.open("spices.csv", "r")
   f_out=CSV.open("plantas.csv", "w ", :headers => true) #outputfile

   f_dataspices.each_line do |line|

       fields = line.split(',')
       newEspecies = Especies.new
       
       newEspecies.id = fields[0].tr_s('"', '').strip 
       newEspecies.type = fields[1].tr_s('"', '').strip 
       newEspecies.code = fields[2].tr_s('"', '').strip 
       newEspecies.name_es = fields[3].tr_s('"', '').strip 
       newEspecies.name_ca = fields[4].tr_s('"', '').strip 
       newEspecies.name_en = fields[5].tr_s('"', '').strip 
       newEspecies.latin_name = fields[6].tr_s('"', '').strip
       newEspecies.customer_id = fields[7].tr_s('"', '').strip 
       
           especia = [newEspecies.id,newEspecies.type,newEspecies.code,newEspecies.name_es,newEspecies.name_ca,newEspecies.name_en,newEspecies.latin_name,newEspecies.customer_id] 
           f_out <<  especia
   end
end

data 
ecode

And the wished output would be like this: species.csv ecode.csv

"id","type","code","name_es","name_ca","name_en","latin_name","customer_id","ecode"
7205,"DunSpecies",NULL,"0","0","0","",11630,LEECO
7437,"DunSpecies",NULL,"0","Xicoira","0","",5273,LEE3O
7204,"DunSpecies",NULL,"0","0","0","",11630,L4ECO

And the actual is this:

"id","type","code","name_es","name_ca","name_en","latin_name","customer_id"
7205,"DunSpecies",NULL,"0","0","0","",11630
7437,"DunSpecies",NULL,"0","Xicoira","0","",5273
7204,"DunSpecies",NULL,"0","0","0","",11630 

(without ecode)

From one side i have the ecode and from the other the whole data i just need to put it together.

I'd like to put all together in the same file (plantas.csv) I did in two different functions because I don't know how to put all together with one foreach I would like to put all in the same function but I don't how doing it. If someone could help me to get this code all in one function and writing the results in the same file I would be so grateful.

An example of the input of the file ecode.csv (in which I just want the ecode field) is this:

"""identifier"",""datatype"",""code"",""lang"",""langno"",""preferred"",""status"",""creation"",""modification"",""country"",""fullname"",""authority"",""shortname"""
"""N1952"",""PFL"",""LEECO"",""la"",""1"",""0"",""N"",""06/06/2000"",""09/03/2010"","""",""Leea coccinea non"",""Planchon"",""Leea coccinea non"""
"""N2974"",""PFL"",""LEECO"",""en"",""1"",""0"",""N"",""06/06/2000"",""21/02/2011"","""",""west Indian holly"","""",""West Indian holly"""

An example of the input of the file data.csv (in which I want all the fields) is this:

"id","type","code","name_es","name_ca","name_en","latin_name","customer_id"
7205,"DunSpecies",NULL,"0","0","0","",11630
7437,"DunSpecies",NULL,"0","Xicoira","0","",5273

And the way to link both files is by creating a third file in which i write everything in it! At least this is my idea, i dont know if there is a simpler way to do it.

Thanks!

CodePudding user response:

Cleaning up ecode.csv made it more challenging, but here is what I came up with:

In case, data.csv and ecode.csv are matched by row numbers:

require 'csv'

data = CSV.read('data.csv', headers: true).to_a
headers = data.shift << 'eppocode'

double_quoted_ecode = CSV.read('ecode.csv')
ecodeIO = StringIO.new
ecodeIO.puts double_quoted_ecode.to_a
ecodeIO.rewind
ecode = CSV.parse(ecodeIO, headers: true)

CSV.open('plantas.csv', 'w ') do |plantas|
  plantas << headers
  data.each.with_index do |row, idx|
    planta = row   [ecode['code'][idx]]
    plantas << planta
  end
end

Using your example files, this gives you the following plantas.csv:

id,type,code,name_es,name_ca,name_en,latin_name,customer_id,eppocode
7205,DunSpecies,NULL,0,0,0,"",11630,LEECO
7437,DunSpecies,NULL,0,Xicoira,0,"",5273,LEECO

In case, entries are matched by data.csv's id and ecode.csv's identifier:

require 'csv'

data = CSV.read('data.csv', headers: true)
headers = data.headers << 'eppocode'

double_quoted_ecode = CSV.read('ecode.csv')
ecodeIO = StringIO.new
ecodeIO.puts double_quoted_ecode.to_a
ecodeIO.rewind
ecode = CSV.parse(ecodeIO, headers: true)

CSV.open('plantas.csv', 'w ') do |plantas|
  plantas << headers
  data.each do |row|
    id = row['id']
    ecode_row = ecode.find { |entry| entry['identifier'] == id } || {}
    planta = row << ecode_row['code']
    plantas << planta
  end
end

I hope you find this helpful.

CodePudding user response:

Data

Let's begin by creating the two CSV files. To make the results easier to follow I have arbitrarily removed some of the fields in each file, and changed one field value.

ecode.csv

ecode = '"""identifier"",""datatype"",""code"",""lang"",""langno"",""preferred"",""status"",""creation"",""modification"",""country"",""fullname"",""authority"",""shortname"""    """N1952"",""PFL"",""LEECO"",""la"",""1"",""0"",""N"",""06/06/2000"",""09/03/2010"","""",""Leea coccinea non"",""Planchon"",""Leea coccinea non"""    """N2974"",""PFL"",""LEEC1"",""en"",""1"",""0"",""N"",""06/06/2000"",""21/02/2011"","""",""west Indian holly"","""",""West Indian holly"""'

File.write('ecode.csv', ecode)
  #=> 452

data.csv

data = '"id","type","code","customer_id"\n7205,"DunSpecies",NULL,11630\n7437,"DunSpecies",NULL,,5273'

File.write('data.csv', data)
  #=> 90

Code

CSV.open('plantas.csv', 'w') do |csv_out|
  converter = ->(s) { s.delete('"') }

  epposcode = CSV.foreach('ecode.csv',
    headers:true,
    header_converters: [converter],
    converters: [converter]
  ).map { |csv| csv["code"] }

  headers = CSV.open('data.csv', &:readline) << 'epposcode'
  csv_out << headers

  CSV.foreach('data.csv', headers:true) do |row|
    csv_out << (row << epposcode.shift)
  end
end
  #=> 90

Result

Let's see what was written.

puts File.read('plantas.csv')

id,type,code,customer_id,epposcode
7205,DunSpecies,NULL,11630,LEECO
7437,DunSpecies,NULL,,5273,LEEC1

Explanation

The structure we want is the following.

CSV.open('plantas.csv', 'w') do |csv_out|
  epposcode = <array of 'code' field values from 'ecode.csv'>
  headers = <headers from 'data.csv' to which 'epposcode' is appended>
  csv_out << headers
  CSV.foreach('data.csv', headers:true) do |row|
    csv_out << <row of 'data.csv' to which an element of epposcode is appended>>
  end
end

CSV::open is the main CSV method for writing files and CSV::foreach is generally my method-of-choice for reading CSV files. I could have instead written the following.

csv_out = CSV.open('plantas.csv', 'w')

epposcode = <array of 'code' field values from 'ecode.csv'>
headers = <headers from 'data.csv' to which 'epposcode' is appended>
csv_out << headers
CSV.foreach('data.csv', headers:true) do |row|
  csv_out << <row of 'data.csv' to which an element of epposcode is appended>>
end

csv_out.close

but using a block is convenient because the file is closed before returning from the block.


It is convenient to use a converter for both the header fields and the row fields:

converter = ->(s) { s.delete('"') }

This is a proc (I've defined a lambda) that removes double quotes from strings. They are specified as two of foreach's optional arguments:

  epposcode = CSV.foreach('ecode.csv',
    headers:true,
    header_converters: [converter],
    converters: [converter]
  )

Search for "Data Converters" in the CSV doc.


We invoke foreach without a block to return an enumerator, so it can be chained to map:

epposcode = CSV.foreach('ecode.csv',
  headers:true,
  header_converters: [converter],
  converters: [converter]
).map { |csv| csv["code"] }

For the example,

epposcode
  #=> ["LEECO", "LEEC1"]
  • Related