I have a few .csv files with words and the frequency of each word in descending order, how could I merge these files into a single .csv?
The function I use for the frequency sort is this one:
def freq_sort(name):
with open("CSV_" str(name[:-4]) ".csv", encoding='utf-8') as f:
reader = csv.reader(f, delimiter=',')
# next(reader) # skip first line, if it contains junk
counter = Counter(chain.from_iterable(takewhile(truth, reader)))
# print(*counter.most_common())
print("freq list created")
writefreq = open("FREQ_" name, 'w', encoding='utf-8')
for fchar in str(counter.most_common()):
writefreq.write(fchar)
if fchar == ')': # makes it more visual although, not needed. counter.most-common() is the unaltered result
writefreq.write('\n')
writefreq.close()
It takes a .csv file like this:
Does the frequency sort and creates a .csv with the result.
Example data:
List1:
[('calvià', 1428) , ('ajuntament', 602) , ('amb', 79) , ('mar', 75) , ('h', 59) , ('ha', 57) , ('es', 50) , ('més', 46) , ('comunicación', 40) , ('dia', 35) , ('avui', 33) , ('hem', 33) , ('son', 32) , ('programa', 32) , ('jornadas', 32) , ('santa', 31) , ('han', 31) , ('información', 29) , ('administraciones', 28) , ('ponça', 27) , ('fins', 27) , ('galatzó', 26) , ('gracias', 25)]
List2:
[('peñíscola', 422) , ('mar', 74) , ('ciudad', 51) , ('feliz', 47) , ('avui', 34) , ('noticia', 33) , ('completa', 33) , ('semana', 29) , ('turismo', 27) , ('gracias', 22) , ('casco', 22) , ('antiguo', 22) , ('castillo', 21) , ('días', 20) , ('españa', 20) , ('imagen', 20)]
CodePudding user response:
import pandas as pd
list1 = [('calvià', 1428) , ('ajuntament', 602) , ('amb', 79) , ('mar', 75) , ('h', 59) , ('ha', 57) , ('es', 50) , ('més', 46) , ('comunicación', 40) , ('dia', 35) , ('avui', 33) , ('hem', 33) , ('son', 32) , ('programa', 32) , ('jornadas', 32) , ('santa', 31) , ('han', 31) , ('información', 29) , ('administraciones', 28) , ('ponça', 27) , ('fins', 27) , ('galatzó', 26) , ('gracias', 25)]
list2 = [('peñíscola', 422) , ('mar', 74) , ('ciudad', 51) , ('feliz', 47) , ('avui', 34) , ('noticia', 33) , ('completa', 33) , ('semana', 29) , ('turismo', 27) , ('gracias', 22) , ('casco', 22) , ('antiguo', 22) , ('castillo', 21) , ('días', 20) , ('españa', 20) , ('imagen', 20)]
df1 = pd.DataFrame(list1, columns =['word', 'frequency'])
df2 = pd.DataFrame(list2, columns =['word', 'frequency'])
result = pd.concat( [df1, df2])
result.to_csv("example.csv", index=False)
CodePudding user response:
First if you want to modify your df in order of frequency you can do this. Few Links for your reference. Group By , Value_counts and transform
df['count'] = df.groupby('Column Name')['Column Name'].transform(pd.Series.value_counts)
df.sort_values('count', ascending=False)
If you want to merge the files then you can use following script
keep all the files into same directory and you can merge them directly. If folder contains multiple files with other extension just use wildcard entry. "*.csv" after directory. Reference URL
import glob
import os
files = os.path.join("file_directory\", "newmergedFileName*.csv")
files = glob.glob(files)