I have this file with some with millions of lines, I should delete with bash all lines that contain more than three times the word "..Nessuna Risposta" for row.
For example: in this output I should delete rows "1003", "1084" and "1096" (and then remove blank line) but not last row "1119" because this line contains word "..Nessuna Risposta" only two times.
1003;"N.Nuovo";"4";"327";"";01102019;"1.F";"49";"4.LAUREA";"1.ITALIANA";"2.Allo sportello";"1.Nessuna Risposta";"2";"6";"5.Nessuna Risposta";"2";"4";"2.Nessuna Risposta";"2";"1";"3";"3"
1084;"N.Nuovo";"4";"327";"";02102019;"1.F";"66";"2.SCUOLA OBBLIGO";"1.ITALIANA";"2.Allo sportello";"7.Nessuna Risposta";"7";"6";"7.Nessuna Risposta";"5";"7";"7";"7.Nessuna Risposta";"7";"7";"7"
1095;"N.Nuovo";"4";"327";"";"327001951";"Poliambulatori";02102019;"1.F";"59";"2.SCUOLA OBBLIGO";"1.ITALIANA";"1.Telefonicamente";"5";"5";"5";"5";"7";"6";"6";"7";"6";"6";"6"
1096;"N.Nuovo";"4";"327";"";"327001951";"Poliambulatori";01102019;"2.M";"48";"3.SCUOLA SUP";"1.ITALIANA";"2.Allo sportello";"6";"5.Nessuna Risposta";"5";"6";"6.Nessuna Risposta";"7";"7";"7.Nessuna Risposta";"7";"7";"7"
1119;"N.Nuovo";"4";"327";"";"327001951";"Laboratorio";03102019;"2.M";"30";"3.SCUOLA SUP";"1.ITALIANA";"2.Allo sportello";"6";"6.Nessuna Risposta";"6";"4.Nessuna Risposta";"6";"6";"6";"6";"6";"6";"6"
I have find and tried this script, but the count *{3,} not work, because this delete all lines contains word "..Nessuna Risposta". Can you help me, please?
grep -v -e ".*Nessuna Risposta.*\{3,\}" $FILE_NAME
CodePudding user response:
Using grep
$ grep -v '\([^.]*\.Nessuna Risposta\)\{3,\}' input_file
1095;N.Nuovo;4;327;;327001951;Poliambulatori;02102019;1.F;59;2.SCUOLA OBBLIGO;1.ITALIANA;1.Telefonicamente;5;5;5;5;7;6;6;7;6;6;6
1119;N.Nuovo;4;327;;327001951;Laboratorio;03102019;2.M;30;3.SCUOLA SUP;1.ITALIANA;2.Allo sportello;6;6.Nessuna Risposta;6;4.Nessuna Risposta;6;6;6;6;6;6;6