How to merge lists from csv files-CodePudding

I have a few .csv files with words and the frequency of each word in descending order, how could I merge these files into a single .csv?

Example

The function I use for the frequency sort is this one:

def freq_sort(name):
    with open("CSV_"   str(name[:-4])   ".csv", encoding='utf-8') as f:
        reader = csv.reader(f, delimiter=',')
        # next(reader)  # skip first line, if it contains junk
        counter = Counter(chain.from_iterable(takewhile(truth, reader)))
    # print(*counter.most_common())
    print("freq list created")
    writefreq = open("FREQ_"   name, 'w', encoding='utf-8')
    for fchar in str(counter.most_common()):
        writefreq.write(fchar)
        if fchar == ')':    # makes it more visual although, not needed. counter.most-common() is the unaltered result
            writefreq.write('\n')
    writefreq.close()

It takes a .csv file like this:

Example2

Does the frequency sort and creates a .csv with the result.

Example data:

List1:

[('calvià', 1428) , ('ajuntament', 602) , ('amb', 79) , ('mar', 75) , ('h', 59) , ('ha', 57) , ('es', 50) , ('més', 46) , ('comunicación', 40) , ('dia', 35) , ('avui', 33) , ('hem', 33) , ('son', 32) , ('programa', 32) , ('jornadas', 32) , ('santa', 31) , ('han', 31) , ('información', 29) , ('administraciones', 28) , ('ponça', 27) , ('fins', 27) , ('galatzó', 26) , ('gracias', 25)]

List2:

[('peñíscola', 422) , ('mar', 74) , ('ciudad', 51) , ('feliz', 47) , ('avui', 34) , ('noticia', 33) , ('completa', 33) , ('semana', 29) , ('turismo', 27) , ('gracias', 22) , ('casco', 22) , ('antiguo', 22) , ('castillo', 21) , ('días', 20) , ('españa', 20) , ('imagen', 20)]

CodePudding user response：

import pandas as pd
list1 =  [('calvià', 1428) , ('ajuntament', 602) , ('amb', 79) , ('mar', 75) , ('h', 59) , ('ha', 57) , ('es', 50) , ('més', 46) , ('comunicación', 40) , ('dia', 35) , ('avui', 33) , ('hem', 33) , ('son', 32) , ('programa', 32) , ('jornadas', 32) , ('santa', 31) , ('han', 31) , ('información', 29) , ('administraciones', 28) , ('ponça', 27) , ('fins', 27) , ('galatzó', 26) , ('gracias', 25)]

list2 = [('peñíscola', 422) , ('mar', 74) , ('ciudad', 51) , ('feliz', 47) , ('avui', 34) , ('noticia', 33) , ('completa', 33) , ('semana', 29) , ('turismo', 27) , ('gracias', 22) , ('casco', 22) , ('antiguo', 22) , ('castillo', 21) , ('días', 20) , ('españa', 20) , ('imagen', 20)]

df1 = pd.DataFrame(list1, columns =['word', 'frequency'])
df2 = pd.DataFrame(list2, columns =['word', 'frequency'])


result =  pd.concat( [df1, df2])
result.to_csv("example.csv", index=False)

CodePudding user response：

First if you want to modify your df in order of frequency you can do this. Few Links for your reference. Group By , Value_counts and transform

df['count'] = df.groupby('Column Name')['Column Name'].transform(pd.Series.value_counts)
df.sort_values('count', ascending=False)

If you want to merge the files then you can use following script

keep all the files into same directory and you can merge them directly. If folder contains multiple files with other extension just use wildcard entry. "*.csv" after directory. Reference URL

import glob
import os
files = os.path.join("file_directory\", "newmergedFileName*.csv")
files = glob.glob(files)