I have some text file data that I am parsing with SED, AWK and Perl.
product {
name { thing1 }
customers {
mary { }
freddy { }
bob {
spouse betty
}
}
}
From the "customers" section, I am trying to get output similar to:
mary{ }
freddy{ }
bob{spouse betty}
Using: sed -n -e "/customers {/,/}/{/customers {/d;/}/d;p;}" $file'
This is the output:
mary { }
freddy { }
bob {
spouse betty
}
How can I concatenate the "bob" customer to one line and remove the extra spaces? The main reason for this specific output is that I am writing a script to grab the "customer" fields and other fields in the text file, then outputting them to a csv file. Which will look something like this. I know this would probably be easier in another language, but bash is what I know.
output.csv
product,customers,another_column
thing1,mary{ } freddy{ } bob{spouse betty},something_else
CodePudding user response:
The data happens to have valid tcl list syntax:
set f [open "input.file"]
set data [dict create {*}[read $f]]
close $f
set name [string trim [dict get $data product name]]
dict for {key val} [dict get $data product customers] {
lappend customers [format "%s{%s}" $key [string trim $val]]
}
set f [open "output.csv" w]
puts $f "product,customers,another_column"
puts $f [join [list $name [join $customers] "something_else"] ,]
close $f
creates output.csv with
product,customers,another_column
thing1,mary{} freddy{} bob{spouse betty},something_else
CodePudding user response:
With your shown samples Only. In GNU awk
you could try following awk
code. We could do it in a single GNU awk
, we need not to pass your sed
command's output to any other tool. Just pass your Input_file to this awk
program(s).
1st solution: To get output between customers
section to }
its closing bracket and values not having starting spaces try following GNU awk
solution.
awk -v RS='\n[[:space:]] customers {[[:space:]]*.*\n[[:space:]] }' '
RT{
sub(/^\n[[:space:]] [^ ]* {[[:space:]]*\n/,"",RT)
sub(/\n[[:space:]] }/,"",RT)
match(RT,/(.*{)[[:space:]]*([^\n]*)(.*)/,arr)
sub(/^[[:space:]] /,"",arr[1])
sub(/\n/,"",arr[2])
gsub(/\n|^[[:space:]] /,"",arr[3])
gsub(/\n[[:space:]] /,"\n",arr[1])
gsub(/ {/,"{",arr[1])
print arr[1] arr[2] arr[3]
}
' Input_file
Output will be as follows:
mary{ }
freddy{ }
bob{spouse betty}
2nd solution: To have starting spaces before values try following code.
awk -v RS='\n[[:space:]] customers {[[:space:]]*.*\n[[:space:]] }' '
RT{
sub(/^\n[[:space:]] [^ ]* {[[:space:]]*\n/,"",RT)
sub(/\n[[:space:]] }/,"",RT)
match(RT,/(.*{)[[:space:]]*([^\n]*)(.*)/,arr)
sub(/\n/,"",arr[2])
gsub(/\n|^[[:space:]] /,"",arr[3])
print arr[1] arr[2] arr[3]
}
' Input_file
Output will be as follows:
mary { }
freddy { }
bob {spouse betty}
Explanation: Simple explanation would be in GNU awk
setting RS(record separator) as \n[[:space:]] customers {[[:space:]]*.*\n[[:space:]] }
to match only required match. Then in main block of this awk
program remove all unnecessary(not needed strings parts) as per requirement by sub
(substitute function) and then using match
function with regex (.*{)[[:space:]]*([^\n]*)(.*)
with 3 capturing groups whose values are getting stored into an array named arr
and then I am substituting all newlines/spaces from it and then printing the values of current line with RT for same.
CodePudding user response:
Maybe ed
ed -s file.txt <<-'EOF'
%s/^[[:space:]]*//
?{?;/^}/j
%s/^\([^\{]*\) \(.*\)$/\1\2 /
/^customers/ 1;/^}/-1j
s/^/thing1,/
s/ *$/,someting_else/
p
Q
EOF
With a temp file, it is a bit more easier to write to a new file.
ed -s file.txt <<-'EOF'
%s/^[[:space:]]*//
/customers {/ 1;/^[[:space:]]*}/w out.txt
%d
r out.txt
?{?;/^}/j
%s/^\([^\{]*\) \(.*\)$/\1\2 /
%j
s/^/thing1,/
s/ *$/,someting_else/
0a
product,customers,another_column
.
w output.csv
,p
Q
EOF
- The latter creates two files,
out.txt
andoutput.csv
- Remove the
,p
if stdout output is not required.
CodePudding user response:
Edit See end for producing complete output
Here is a regex for it, probably in just about any language, run on the whole file in a string. This, as it stands, assumes that there can only be one level of nesting under a customer, in other words bob
cannot have { pets { dog } }
or some such.
Extract content of customers
section
/customers\s*{\s* ( (?: [^{] {[^}]*} ) )/x;
then collapse newline spaces into a single space
s/\n\s / /g;
then trim spaces from strings like bob { spouse }
, but not from mary { }
s/{\s ([^}] ) \s }/{$1}/gx;
If bob
and the crew can really be only word-characters then instead of [^{}]
we can use the far nicer \w
.
Altogether, in a Perl command-line program ("one"-liner) as seems to be desired
perl -wE'die"file?\n" if not @ARGV;
$d = do { local $/; <> };
($c) = $d =~ /customers\s*{\s* ( (?: [^{] {[^}]*} ) )/x;
$c =~ s/\n\s / /g;
$c =~ s/{\s ([^}] ) \s }/{$1}/gx;
say $c
' data.txt
Prints, for data given in the question
mary { } freddy { } bob {spouse betty}
To print each customer in a separate line can do for example
say for split /(?<=\})\s /, $c;
(to be the last line in code)
I now realize that there is more to capture and print, described in the last paragraph. Adding to the beginning of the regex to capture the name
, and adding the required printing
perl -wE'die"file?\n" if not @ARGV;
$d = do { local $/; <> };
($n, $c) = $d =~ /name\s*{\s* ([^}] ) \s*} .*? customers\s*{\s* ( (?: [^{] {[^}]*} ) )/sx;
$n =~ s/^\s |\s $//g;
$c =~ s/\n\s / /g;
$c =~ s/{\s ([^}] ) \s }/{$1}/gx;
say "product,customers,another_column"
say "$n,$c,something_else"
' data.txt > output.csv
Prints as shown in the question.
CodePudding user response:
Following code sample demonstrates most primitive parser for provided sample data.
This code restores data structure and can be then used any imaginable way, for example stored as CVS, JSON, YAML file.
In real life the input data can be quite different and this code probably will not process it correctly.
The code provided for educational purpose only.
use strict;
use warnings;
use feature 'say';
use Data::Dumper;
my $data = do { local $/; <DATA> };
$data =~ s/\n/ /g;
$data =~ s/ / /g;
say Dumper parse($data);
exit 0;
sub parse {
my $str = shift;
my $ret;
while( $str =~ /^(\S ) \{ (\S ) \{ \S / ) {
if( $str =~ /^(\S ) \{ (\S ) \{ ([^}] ?) \{(. ?)\}/ ) {
$ret->{$1}{$2}{$3} = $4;
$ret->{$1}{$2}{$3} =~ s/(^\s |\s $)//g;
$str =~ s/^(\S ) \{ (\S ) \{(. ?)\{(.*?)\}/$1 \{ $2 \{/;
}
if( $str =~ /^(\S ) \{ (\S ) \{\s*([^{] ?)\s*\}/ ) {
$ret->{$1}{$2} = $3 if length($3) > 1;
$str =~ s/^(\S ) \{ \S \{\s*[^\}] \s*\}/$1 \{/;
}
}
return $ret;
}
__DATA__
product {
name { thing1 }
customers {
mary { }
freddy { }
bob {
spouse betty
}
}
}
Output
$VAR1 = {
'product' => {
'customers' => {
'bob' => 'spouse betty',
'freddy' => '',
'mary' => ''
},
'name' => 'thing1'
}
};