Home > database >  Reversable CSV parsing
Reversable CSV parsing

Time:01-24

Prolog newbie here. In SWI Prolog, I'm trying to figure out how to parse a simple line of CSV reversably, but I'm stuck. Here's what I've got:

csvstring1(S, L) :-
  split_string(S, ',', ',', T),
  maplist(atom_number, T, L).
   
csvstring2(S, L) :-
  atomic_list_concat(T, ',', S),
  maplist(atom_number, T, L).

% This one is the same except that maplist comes first. 
csvstring3(S, L) :-
  maplist(atom_number, T, L),
  atomic_list_concat(T, ',', S).

Now csvstring1 and csvstring2 work in a "forward" manner:

?- csvstring1('1,2,3,4', L).
L = [1, 2, 3, 4].

?- csvstring2('1,2,3,4', L).
L = [1, 2, 3, 4].

But not csvstring3:

?- csvstring3('1,2,3,4', L).
ERROR: Arguments are not sufficiently instantiated

Moreover csvstring3 works in reverse, but not the other two predicates:

?- csvstring3(L, [1,2,3,4]).
L = '1,2,3,4'.

?- csvstring1(L, [1,2,3,4]).
ERROR: Arguments are not sufficiently instantiated

?- csvstring2(L, [1,2,3,4]).
ERROR: Arguments are not sufficiently instantiated

How can I combine these into a single predicate?

CodePudding user response:

There is no particularly newbie friendly way to do it which doesn't compromise somewhere. This is the easiest:

csvString_list(String, List) :-
    ground(String),
    atomic_list_concat(Temp, ',', String),
    maplist(atom_number, Temp, List).

csvString_list(String, List) :-
    ground(List),
    maplist(atom_number, Temp, List),
    atomic_list_concat(Temp, ',', String).

but it makes and leaves spurious choicepoints.

This cuts the choicepoints which is nice when using it, but poor practise to get into without being aware of what that means:

csvString_list(String, List) :-
    ground(String),
    atomic_list_concat(Temp, ',', String),
    maplist(atom_number, Temp, List),
    !.

csvString_list(String, List) :-
    ground(List),
    maplist(atom_number, Temp, List),
    atomic_list_concat(Temp, ',', String).

This uses if/else which is less code:

csvString_list(String, List) :-
  ground(String) ->
      (atomic_list_concat(Temp, ',', String), maplist(atom_number, Temp, List))
    ; (maplist(atom_number, Temp, List),      atomic_list_concat(Temp, ',', String)).

but is logically bad and you should reify the branching with if_ which isn't builtin to SWI Prolog and is less simple to use.

Or you could write a grammar with a DCG, which is not newbie territory:


:- set_prolog_flag(double_quotes, chars).
:- use_module(library(dcg/basics)).

csvTail([N|Ns]) --> [','], number(N), csvTail(Ns). 
csvTail([])     --> [].

csv([N|Ns]) --> number(N), csvTail(Ns).

e.g.

?- phrase(csv(Ns), "11,22,33,44,55").
Ns = [11, 22, 33, 44, 55]


?- phrase(csv([11, 22, 33, 44, 55]), String)
String = [49, 49, ',', 50, 50, ',', 51, 51, ',', 52, 52, ',', 53, 53]

but now you're back to it leaving spurious choicepoints while parsing and you have to deal with the historic split of strings/atoms/character codes in SWI Prolog; that list will unify with "11,22,33,44,55" because of the double_quotes flag but it doesn't look like it will.

CodePudding user response:

split_string is not reversible. Can use DCG - here is a simple multi-line DCG parser for CSV:

% Nicer formatting
% https://www.swi-prolog.org/pldoc/man?section=flags
:- set_prolog_flag(answer_write_options, [quoted(true), portray(true), spacing(next_argument), max_depth(100), attributes(portray)]).

% Show lists of codes as text (if 3 chars or longer)
:- portray_text(true).

csv_lines([]) --> [].
% Newline after every line
csv_lines([H|T]) --> csv_fields(H), [10], csv_lines(T).

csv_fields([H|T]) --> csv_field(H), csv_field_end(T).

csv_field_end([]) --> [].
% Comma between fields
csv_field_end(T) --> [44], csv_fields(T).

csv_field([]) --> [].
csv_field([H|T]) -->
    [H],
    % Fields cannot contain comma, newline or carriage return
    { maplist(dif(H), [44, 10, 13]) },
    csv_field(T).

To demonstrate reversibility:

% Note: z is char 122
?- phrase(csv_lines([[`def`, `cool`], [`abc`, [122]]]), Lines).
Lines = `def,cool\nabc,z\n` ;
false.

?- phrase(csv_lines(Fields), `def,cool\nabc,z\n`).
Fields = [[`def`, `cool`], [`abc`, [122]]] ;
false.

To parse the field contents and maintain reversibility, can use e.g. atom_codes.

CodePudding user response:

Others have given some advice and a lot of code. With SWI-Prolog, to parse comma-separated integers, you would use library(dcg/basics) and library(dcg/high_order) to do that trivially:

?- use_module(library(dcg/basics)), use_module(library(dcg/high_order)).
true.

?- phrase(sequence(integer, ",", Ns), `1,2,3,4`).
Ns = [1, 2, 3, 4].

?- phrase(sequence(integer, ",", [7,6,42]), S), format("~s", [S]).
7,6,42
S = [55, 44, 54, 44, 52, 50].
  • Related