Home > Mobile >  Aggregating data in a csv
Aggregating data in a csv

Time:05-16

I have to generate a HTML file to show how I have aggregated data in a csv file.

The structure of this file is as follows:

num_expediente;fecha;hora;localizacion;numero;cod_distrito;distrito;tipo_accidente;estado_meteorológico;tipo_vehiculo;tipo_persona;rango_edad;sexo;cod_lesividad;lesividad;coordenada_x_utm;coordenada_y_utm;positiva_alcohol;coste;positiva_droga
2022S000001;Enero;Noche;AVENIDA ALBUFERA;19;13;13_PUENTE DE VALLECAS;Choque;Despejado;Vehículo ligero;Conductor;<30;Mujer;0;Sin asistencia;443359,226;4472082,272;0;0;0
2022S000002;Enero;Noche;PLAZA CANOVAS DEL CASTILLO;2;3;3_RETIRO;Choque;Desconocido;Motocicleta;Conductor;31_60;Hombre;0;Sin asistencia;441155,351;4474129,588;1;0;0
2022S000003;Enero;Noche;CALLE SAN BERNARDO;53;1;1_CENTRO;Atropello;Despejado;Motocicleta;Conductor;Desconocido;Desconocido;0;Sin asistencia;439995,351;4475212,523;0;0;0
2022S000004;Enero;Noche;CALLE ALCALA;728;20;20_SAN BLAS-CANILLEJAS;Choque;Despejado;Vehículo ligero;Conductor;31_60;Hombre;2;Leve;449693,925;4477837,552;0;200;0
2022S000004;Enero;Noche;CALLE ALCALA;728;20;20_SAN BLAS-CANILLEJAS;Choque;Despejado;Vehículo ligero;Pasajero;31_60;Mujer;3;Grave;449693,925;4477837,552;0;3000;0

num_expediente is the id of the accident fecha is the month of the accident sexo is the gender of the person implied in the accident coste is the cost of the accident for the person implied

I would like to create a table showing the accumulated cost per month and gender. I use this script:

#! /usr/bin/awk -f
BEGIN {FS=OFS=";"}
function loop(array, name, i) {
    for (i in array) {
        if (isarray(array[i]))
            loop(array[i], (name "[" i "]"))
        else
            printf("%s[%s] = %s\n",name, i, arr[i])
        }
}
NR!=1{
    array[$2][$13] =$19
}
END {loop(array, "")
}

But the output is not aggregating the cost:

[Enero][Hombre] = 
[Enero][Desconocido] = 
[Enero][Mujer] = 
[Febrero][Hombre] = 
[Febrero][Mujer] = 
[Febrero][Desconocido] = 
[Marzo][Hombre] = 
[Marzo][Desconocido] = 
[Marzo][Mujer] = 

I dont know why this is not working. I dont have idea how to generate the html out of this output. Could you help with that too?

CodePudding user response:

As mentioned in the comments OP has a typo in the printf where arr[i] should be array[i]; while this should address OP's current issue I'm not sure I understand the use of a recursive function call unless OP's real world problem is dealing with arrays of varying dimensions.

Since we're dealing with an array of known dimension (ie, 2) one simpliified awk idea:

awk -F';' '
NR>1 { array[$2][$13] =$19 }
END  { for (month in array)
           for (gender in array[month])
               printf "[%s][%s] = %s\n", month, gender, array[month][gender]
     }
' raw.csv

For the provided input this generates:

[Enero][Hombre] = 200
[Enero][Desconocido] = 0
[Enero][Mujer] = 3000

NOTES:

  • this solution does not address any sorting requirements OP may have for the output
  • for an additional sorting requirement I'd suggest OP first address the current issue and once solved then attempt to apply the additional sorting requirement and ...
  • if having problems with sorting then ask a new question (making sure to include a complete list of months and genders and the desired sort order for both components)
  • Related