Home > Software design >  Split multiple CSV cell values from multiple columns using Ruby
Split multiple CSV cell values from multiple columns using Ruby

Time:04-11

I'm trying to split multiple values in a CSV cell. I can do it right if the multiple values in a cell is found in a single column only, but I'm having difficulty doing it if the multiple values are found in multiple columns. Any guidance will be appreciated.

Here's the sample of the data I'm trying to split:

| Column A              | Column B               |
|Value1, Value2, Value3 | Value3, Value4, Value5 |
|Value6                 | Value7, Value8         |

I'm aiming to have a result like this:

| Column A | Column B |
| Value1   | Value3   |
| Value2   | Value4   |
| Value3   | Value5   |
| Value6   | Value7   |
| Value6   | Value8   |

Here's my code:

require 'csv'

split_a = []
split_b = []

def split_values(value)
  value = value.to_s
  value = value.gsub('/', ',').gsub('|', ',').gsub(' ', ',')
  return value.split(',').map(&:strip)
end

source_csv = kendo_shipment = CSV.read('source_file.csv', headers: true, header_converters: :symbol, liberal_parsing: true).map(&:to_h)

source_csv.each do |source_csv|
  column_a = source_csv[:column_a]
  column_b = source_csv[:column_b]
  column_a = split_values(column_a)
  column_a.each do |column_a|
    next if column_a.nil? || column_a.empty?

    split_a << [
      column_a: column_a,
      column_b: column_b
    ]
  end
end

split_a.each do |key, split_a|
  column_a = key[:column_a]
  column_b = key[:column_b]
  column_b = split_values(column_b)
  column_b.each do |column_b|
    next if column_b.nil? || column_b.empty?

    split_b << [
      column_a,
      column_b
    ]
  end
end

CodePudding user response:

There is a special option to define a column separator col_sep: '|' it simplifies the code.

require 'csv'

source_csv = CSV.read('tmp/source_file.csv', col_sep: '|', headers: true, header_converters: :symbol, liberal_parsing: true)

split_a = []
split_b = []

# I just assign values to separate arrays, because I am not sure what kind of data structure you want to get at the end.
source_csv.each do |row|
  split_a  = row[:column_a].split(',').map(&:strip)
  split_b  = row[:column_b].split(',').map(&:strip)
end

# The result
split_a
# => ["Value1", "Value2", "Value3", "Value6"]

split_b
# => ["Value3", "Value4", "Value5", "Value7", "Value8"]

CodePudding user response:

Here is the code:

require 'csv'

source_csv = CSV.read('source_file.csv', col_sep: '|', headers: true, header_converters: :symbol, liberal_parsing: true)

# Column values:
def column_values(data_rows, column)
  data_rows&.map{|row| row[column]&.split(',')&.map(&:strip) }&.flatten(1) || []
end
col_a = column_values(source_csv, :column_a)
col_b = column_values(source_csv, :column_b)

# repeat the last value in case the number of values in the columns differs:
if col_b.length > col_a.length
  col_a.fill(col_a.last, (col_a.length - 1)..(col_b.length - 1))
elsif col_1.length > col_2.length
  col_b.fill(col_b.last, (col_b.length - 1)..(col_a.length - 1))
end

# result:
split_b = col_a.zip(col_b)
# => [
#  ["Value1", "Value3"],
#  ["Value2", "Value4"],
#  ["Value3", "Value5"],
#  ["Value6", "Value7"],
#  ["Value6", "Value8"]
# ]
  •  Tags:  
  • ruby
  • Related