How to sort words by frequency in a csv file-CodePudding

I have .csv files that look like this but I can't figure out how to read and sort by frequency the words of the first paragraph of the file, it looks like this:

enter image description here

I only want to sort the selected lines which have a length of 'total_lines' that I already calculate elsewhere. I have tried this code but it doesn't give me the result I need:

from collections import Counter

y = open(filename, 'r', newline='')
print(collections.Counter(y))

I have also tried using the csv reader but couldn't manage either.

This is part of the file:

UserScreenNameUserNameTimestampTextEmbedded_textEmojisCommentsLikesRetweetsImage,linkTweet,URLAjuntament,de,Calvià,de,Calvià
ene,Los,Reyes,Magos,ya,están,repartiendo,ilusión,por,#,Calvià,#,CabalgataDeReyesAjuntament,de,Calvià,de,Calvià
en,eLos,Reyes,magos,repartirán,más,de,kg,de,caramelos,#,SinGlúten,en,la,Gran,#,Cabalgata,de,CalviàNos,vemos,a,las,h,en,Palmanova,Ajuntament,de,Calvià,de,Calvià
ene,Buena,acogida,de,la,campaña,de,identificación,y,esterilización,de,el,gato,doméstico,en,CalviàAjuntament,de,Calvià,de,Calvià
ene,ha,extraído,unastoneladas,de,#residuossólidos,de,las,estaciones,de,bombeo,de,aguas,residuales,de,#,Calvià,(,EBAR,),durante,las,tareas,de,limpieza,en,profundidad,de,estas,instalacionesAjuntament,de,Calvià,de,Calvià
ene,Fotos,de,la,Gran,Cabalgata,de,#,Calvià,#IlusiónAjuntament,de,Calvià,de,Calvià
ene,Primera,presentación,de,el,avance,de,el,PGOU,de,#,Calvià,a,la,ciudadanía,#,Participación,#,Transparencia,Ajuntament,de,Calvià,de,Calvià
ene,Reunión,de,la,Alianza,deMunicipios,Turísticos,de,Sol,y,Playa,en,Fitur,Seguimos,trabajando,para,mejorar,nuestro,destino,#,Turismo,Ajuntament,de,Calvià,de,Calvià
ene,Entrega,de,premios,ITH,Smart,Destination,Awards,de,en,#FITUR,Ajuntament,de,Calvià,de,Calvià
ene,El,teniente,alcalde,de,turismo,se,reúne,con,el,director,de,la,oficina,española,de,turismo,en,Frankfurt,#,FiturAjuntament,de,Calvià,de,Calvià
ene,Seguimos,en,#,Fitur,trabajando,para,promocionar,#,Calvià,como,un,destino,lleno,de,oportunidades,durante,todo,el,año,Ajuntament,de,Calvià,de,Calvià
ene,En,#FITUR,entrevista,a,para,hablar,de,#,TurismoAjuntament,de,Calvià,de,Calvià
ene,ha,saludado,a,el,alumnado,de,Turismo,de,el,IES,Calvià,en,#,FiturAjuntament,de,Calvià,de,Calvià
ene,En,la,agenda,deencontrarás,todas,las,actividades,que,se,realizan,en,el,municipioNo,te,pierdas,nada,Ajuntament,de,Calvià,de,Calvià

UserScreenNameUserNameTimestampTextEmbedded_textEmojisCommentsLikesRetweetsImage,linkTweet,URLAjuntament,de,Calvià,de,Calvià
ene,el,Reyes,Magos,ya,estar,repartir,ilusión,por,#,Calvià,#,CabalgataDeReyesAjuntament,de,Calvià,de,Calvià
en,eLos,Reyes,mago,repartir,más,de,kg,de,caramelo,#,SinGlúten,en,el,Gran,#,Cabalgata,de,CalviàNos,ver,a,el,h,en,Palmanova,Ajuntament,de,Calvià,de,Calvià
ene,buen,acogida,de,el,campaña,de,identificación,y,esterilización,de,el,gato,doméstico,en,CalviàAjuntament,de,Calvià,de,Calvià
ene,haber,extraer,unastonelado,de,#residuossólidos,de,el,estación,de,bombeo,de,agua,residual,de,#,Calvià,(,EBAR,),durante,el,tarea,de,limpieza,en,profundidad,de,este,instalacionesAjuntament,de,Calvià,de,Calvià
ene,Fotos,de,el,Gran,Cabalgata,de,#,Calvià,#IlusiónAjuntament,de,Calvià,de,Calvià
ene,primero,presentación,de,el,avance,de,el,PGOU,de,#,Calvià,a,el,ciudadanía,#,Participación,#,Transparencia,Ajuntament,de,Calvià,de,Calvià
ene,Reunión,de,el,Alianza,deMunicipios,Turísticos,de,Sol,y,Playa,en,Fitur,seguir,trabajar,para,mejorar,nuestro,destino,#,Turismo,Ajuntament,de,Calvià,de,Calvià
ene,Entrega,de,premio,ITH,Smart,Destination,Awards,de,en,#FITUR,Ajuntament,de,Calvià,de,Calvià
ene,el,teniente,alcalde,de,turismo,él,reunir,con,el,director,de,el,oficina,español,de,turismo,en,Frankfurt,#,FiturAjuntament,de,Calvià,de,Calvià
ene,seguir,en,#,Fitur,trabajar,para,promocionar,#,Calvià,como,uno,destino,lleno,de,oportunidad,durante,todo,el,año,Ajuntament,de,Calvià,de,Calvià
ene,en,#FITUR,entrevista,a,para,hablar,de,#,TurismoAjuntament,de,Calvià,de,Calvià
ene,haber,saludar,a,el,alumnado,de,Turismo,de,el,IES,Calvià,en,#,FiturAjuntament,de,Calvià,de,Calvià
ene,en,el,agenda,deencontrar,todo,el,actividad,que,él,realizar,en,el,municipioNo,tú,perder,nada,Ajuntament,de,Calvià,de,Calvià

PROPN,PROPN,PROPN,ADP,PROPN,ADP,PROPN
ADP,DET,PROPN,PROPN,ADV,AUX,VERB,NOUN,ADP,PROPN,PROPN,PROPN,PROPN,ADP,PROPN,ADP,PROPN
ADP,PROPN,PROPN,NOUN,VERB,ADV,ADP,NUM,ADP,NOUN,NOUN,PROPN,ADP,DET,ADJ,PROPN,PROPN,ADP,PROPN,VERB,ADP,DET,NOUN,ADP,PROPN,PROPN,ADP,PROPN,ADP,PROPN
PUNCT,ADJ,NOUN,ADP,DET,NOUN,ADP,NOUN,CCONJ,NOUN,ADP,DET,NOUN,ADJ,ADP,PROPN,ADP,PROPN,ADP,PROPN
PROPN,AUX,VERB,ADJ,ADP,NOUN,ADP,DET,NOUN,ADP,NOUN,ADP,NOUN,ADJ,ADP,PROPN,PROPN,PUNCT,PROPN,PUNCT,ADP,DET,NOUN,ADP,NOUN,ADP,NOUN,ADP,DET,NOUN,ADP,PROPN,ADP,PROPN
PROPN,PROPN,ADP,DET,ADJ,PROPN,ADP,PROPN,PROPN,PROPN,ADP,PROPN,ADP,PROPN
PUNCT,ADJ,NOUN,ADP,DET,NOUN,ADP,DET,PROPN,ADP,PROPN,PROPN,ADP,DET,NOUN,PROPN,PROPN,PROPN,PROPN,PROPN,ADP,PROPN,ADP,PROPN
PUNCT,PROPN,ADP,DET,PROPN,PROPN,PROPN,ADP,PROPN,CCONJ,PROPN,ADP,PROPN,VERB,VERB,ADP,VERB,DET,NOUN,PROPN,PROPN,PROPN,ADP,PROPN,ADP,PROPN
PROPN,PROPN,ADP,NOUN,PROPN,PROPN,PROPN,PROPN,ADP,ADP,PROPN,PROPN,ADP,PROPN,ADP,PROPN
PUNCT,DET,NOUN,NOUN,ADP,NOUN,PRON,VERB,ADP,DET,NOUN,ADP,DET,NOUN,ADJ,ADP,NOUN,ADP,PROPN,PROPN,PROPN,ADP,PROPN,ADP,PROPN
PUNCT,VERB,ADP,NOUN,PROPN,VERB,ADP,VERB,NOUN,PROPN,SCONJ,DET,NOUN,ADJ,ADP,NOUN,ADP,DET,DET,NOUN,PROPN,ADP,PROPN,ADP,PROPN
PUNCT,ADP,DET,NOUN,ADP,ADP,VERB,ADP,PROPN,PROPN,ADP,PROPN,ADP,PROPN
PROPN,AUX,VERB,ADP,DET,NOUN,ADP,PROPN,ADP,DET,PROPN,PROPN,ADP,PROPN,PROPN,ADP,PROPN,ADP,PROPN
PUNCT,ADP,DET,NOUN,VERB,DET,DET,NOUN,PRON,PRON,VERB,ADP,DET,NOUN,PRON,VERB,PRON,PROPN,ADP,PROPN,ADP,PROPN

,,,,,,
,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,

Expected output should be something like:

[('word1', freq1), ('word2', freq2), ('word3', freq3)...]

CodePudding user response：

So the task is to read CSV file from second line to first occurrence of empty line and count all words.

To read CSV file you can use built-in csv.reader(). Basically returned reader is an iterator which yields list of columns from every line, so to skip first line we can just call next() once. We can manually iterate over reader until it returns empty list or apply itertools.takewhile() with operator.truth(). To pass all words directly into collections.Counter() constructor we can apply chain.from_iterable() which will flatten lists returned from takewhile(). To get result sorted by frequency we can use Counter.most_common().

Code:

import csv
from itertools import takewhile, chain
from operator import truth
from collections import Counter

with open(r"CSV_100.csv") as f:
    reader = csv.reader(f)
    next(reader)  # skip first line
    counter = Counter(chain.from_iterable(takewhile(truth, reader)))
print(*counter.most_common())

Output:

('de', 55) ('CalviÃ\xa0', 32) ('#', 15) ('ene', 12) ('en', 11) ('el', 8) ('la', 7) ('Ajuntament', 6) ('a', 4) ('las', 4) ('para', 3) ('Reyes', 2) ('Gran', 2) ('Cabalgata', 2) ('y', 2) ('ha', 2) ('durante', 2) ('Fitur', 2) ('Seguimos', 2) ('trabajando', 2) ('destino', 2) ('Turismo', 2) ('#FITUR', 2) ('turismo', 2) ('se', 2) ('FiturAjuntament', 2) ('En', 2) ('Los', 1) ('Magos', 1) ('ya', 1) ('estÃ¡n', 1) ('repartiendo', 1) ('ilusiÃ³n', 1) ('por', 1) ('CabalgataDeReyesAjuntament', 1) ('eLos', 1) ('magos', 1) ('repartirÃ¡n', 1) ('mÃ¡s', 1) ('kg', 1) ('caramelos', 1) ('SinGlÃºten', 1) ('CalviÃ\xa0Nos', 1) ('vemos', 1) ('h', 1) ('Palmanova', 1) ('Buena', 1) ('acogida', 1) ('campaÃ±a', 1) ('identificaciÃ³n', 1) ('esterilizaciÃ³n', 1) ('gato', 1) ('domÃ©stico', 1) ('CalviÃ\xa0Ajuntament', 1) ('extraÃ\xaddo', 1) ('unastoneladas', 1) ('#residuossÃ³lidos', 1) ('estaciones', 1) ('bombeo', 1) ('aguas', 1) ('residuales', 1) ('(', 1) ('EBAR', 1) (')', 1) ('tareas', 1) ('limpieza', 1) ('profundidad', 1) ('estas', 1) ('instalacionesAjuntament', 1) ('Fotos', 1) ('#IlusiÃ³nAjuntament', 1) ('Primera', 1) ('presentaciÃ³n', 1) ('avance', 1) ('PGOU', 1) ('ciudadanÃ\xada', 1) ('ParticipaciÃ³n', 1) ('Transparencia', 1) ('ReuniÃ³n', 1) ('Alianza', 1) ('deMunicipios', 1) ('TurÃ\xadsticos', 1) ('Sol', 1) ('Playa', 1) ('mejorar', 1) ('nuestro', 1) ('Entrega', 1) ('premios', 1) ('ITH', 1) ('Smart', 1) ('Destination', 1) ('Awards', 1) ('El', 1) ('teniente', 1) ('alcalde', 1) ('reÃºne', 1) ('con', 1) ('director', 1) ('oficina', 1) ('espaÃ±ola', 1) ('Frankfurt', 1) ('promocionar', 1) ('como', 1) ('un', 1) ('lleno', 1) ('oportunidades', 1) ('todo', 1) ('aÃ±o', 1) ('entrevista', 1) ('hablar', 1) ('TurismoAjuntament', 1) ('saludado', 1) ('alumnado', 1) ('IES', 1) ('agenda', 1) ('deencontrarÃ¡s', 1) ('todas', 1) ('actividades', 1) ('que', 1) ('realizan', 1) ('municipioNo', 1) ('te', 1) ('pierdas', 1) ('nada', 1)

You can help my country, check my profile info.

CodePudding user response：

Maybe try something like

words = []
with open('data.csv','r') as file:
    while line:=file.readline().rstrip(): #stop on end file or double new line
        words  = line.split(',')
freq = {}
for word in set(words): 
    freq[word] = words.count(word)

print(freq)

print(sorted(freq.items(),key=lambda item:item[1]))

Your .csv also seems to have a bunch of typos, in case that's an issue you didn't catch yet.