I'm trying to split multiple values in a CSV cell. I can do it right if the multiple values in a cell is found in a single column only, but I'm having difficulty doing it if the multiple values are found in multiple columns. Any guidance will be appreciated.
Here's the sample of the data I'm trying to split:
| Column A | Column B |
|Value1, Value2, Value3 | Value3, Value4, Value5 |
|Value6 | Value7, Value8 |
I'm aiming to have a result like this:
| Column A | Column B |
| Value1 | Value3 |
| Value2 | Value4 |
| Value3 | Value5 |
| Value6 | Value7 |
| Value6 | Value8 |
Here's my code:
require 'csv'
split_a = []
split_b = []
def split_values(value)
value = value.to_s
value = value.gsub('/', ',').gsub('|', ',').gsub(' ', ',')
return value.split(',').map(&:strip)
end
source_csv = kendo_shipment = CSV.read('source_file.csv', headers: true, header_converters: :symbol, liberal_parsing: true).map(&:to_h)
source_csv.each do |source_csv|
column_a = source_csv[:column_a]
column_b = source_csv[:column_b]
column_a = split_values(column_a)
column_a.each do |column_a|
next if column_a.nil? || column_a.empty?
split_a << [
column_a: column_a,
column_b: column_b
]
end
end
split_a.each do |key, split_a|
column_a = key[:column_a]
column_b = key[:column_b]
column_b = split_values(column_b)
column_b.each do |column_b|
next if column_b.nil? || column_b.empty?
split_b << [
column_a,
column_b
]
end
end
CodePudding user response:
There is a special option to define a column separator col_sep: '|'
it simplifies the code.
require 'csv'
source_csv = CSV.read('tmp/source_file.csv', col_sep: '|', headers: true, header_converters: :symbol, liberal_parsing: true)
split_a = []
split_b = []
# I just assign values to separate arrays, because I am not sure what kind of data structure you want to get at the end.
source_csv.each do |row|
split_a = row[:column_a].split(',').map(&:strip)
split_b = row[:column_b].split(',').map(&:strip)
end
# The result
split_a
# => ["Value1", "Value2", "Value3", "Value6"]
split_b
# => ["Value3", "Value4", "Value5", "Value7", "Value8"]
CodePudding user response:
Here is the code:
require 'csv'
source_csv = CSV.read('source_file.csv', col_sep: '|', headers: true, header_converters: :symbol, liberal_parsing: true)
# Column values:
def column_values(data_rows, column)
data_rows&.map{|row| row[column]&.split(',')&.map(&:strip) }&.flatten(1) || []
end
col_a = column_values(source_csv, :column_a)
col_b = column_values(source_csv, :column_b)
# repeat the last value in case the number of values in the columns differs:
if col_b.length > col_a.length
col_a.fill(col_a.last, (col_a.length - 1)..(col_b.length - 1))
elsif col_1.length > col_2.length
col_b.fill(col_b.last, (col_b.length - 1)..(col_a.length - 1))
end
# result:
split_b = col_a.zip(col_b)
# => [
# ["Value1", "Value3"],
# ["Value2", "Value4"],
# ["Value3", "Value5"],
# ["Value6", "Value7"],
# ["Value6", "Value8"]
# ]