Given the following script and dataset: Script:
while IFS=","
read v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13;
do if [ -z "$v12" ];
then echo "$v1,$v2,$v3,$v4,$v5,$v6,$v7,$v8,$v9,$v10,$v11,'unknown',$v13";
else echo "$v1, $v2,$v3,$v4,$v5,$v6,$v7,$v8,$v9,$v10,$v11,$v12,$v13";
fi;
done
>train3.csv
Dataset:
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,,S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,,S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,,S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,,S
9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,,S
10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,,C
I want to export as a CSV file with name 'train3.csv' but the way I'm doing don't work and it doesn't show the changes done or saves as a CSV file.
How can I solve this?
The expected result would be:
PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
1,0,3,"Braund, Mr. Owen Harris",male,22,1,0,A/5 21171,7.25,'unknown',S
2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Thayer)",female,38,1,0,PC 17599,71.2833,C85,C
3,1,3,"Heikkinen, Miss. Laina",female,26,0,0,STON/O2. 3101282,7.925,'unknown',S
4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35,1,0,113803,53.1,C123,S
5,0,3,"Allen, Mr. William Henry",male,35,0,0,373450,8.05,'unknown',S
6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,'unknown',Q
7,0,1,"McCarthy, Mr. Timothy J",male,54,0,0,17463,51.8625,E46,S
8,0,3,"Palsson, Master. Gosta Leonard",male,2,3,1,349909,21.075,'unknown',S
9,1,3,"Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)",female,27,0,2,347742,11.1333,'unknown',S
10,1,2,"Nasser, Mrs. Nicholas (Adele Achem)",female,14,1,0,237736,30.0708,'unknown',C
also including a new CSV file creation.
Thanks.
CodePudding user response:
Do not use Bash for this. Your input CSV contains quoted strings. You probably have no guarantee, that a quoted string must contain exactly one comma. If it contains less or more commas, this will break your code.
Instead use a dedicated tool, which handles quoted strings correctly. The easiest tool to use is Perl with the module DBD::CSV. The following command will install it on Debian.
sudo apt-get install libdbd-csv-perl
Now you can use SQL to fix your CSV files.
#! /usr/bin/perl
use DBI;
$dbh = DBI->connect ("dbi:CSV:")
or die "Cannot connect: $DBI::errstr";
my $sth = $dbh->prepare ("UPDATE train3.csv SET cabin = ? WHERE cabin is null");
$sth->execute ("'unknown'");
$sth->finish;
$dbh->disconnect;
If you do not want to learn Perl, you can use the script as a ready to use program from your command line. Save it in csv.pl
and make it executable:
#! /usr/bin/perl
use DBI;
$dbh = DBI->connect ("dbi:CSV:")
or die "Cannot connect: $DBI::errstr";
my $sth = $dbh->prepare (shift);
$sth->execute (@ARGV);
$sth->finish;
$dbh->disconnect;
Next you can pass just the query and its arguments:
./csv.pl 'UPDATE train3.csv SET cabin = ? WHERE cabin is null' \'unknown\'
Keep an eye on the quoting.
CodePudding user response:
Modify your code slightly:
#!/bin/bash
datafile='dataset.txt'
outputfile='train3.csv'
>"$outputfile"
while IFS="," read -r v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 v13
do
if [[ -z "$v12" ]]
then
echo "$v1,$v2,$v3,$v4,$v5,$v6,$v7,$v8,$v9,$v10,$v11,'unknown',$v13" >>"$outputfile"
else
echo "$v1, $v2,$v3,$v4,$v5,$v6,$v7,$v8,$v9,$v10,$v11,$v12,$v13" >>"$outputfile"
fi
done < "$datafile"
A great reference for reading data from files is https://mywiki.wooledge.org/BashFAQ/001