Home > Software engineering >  How can I return the most reoccuring value in the given csv list?
How can I return the most reoccuring value in the given csv list?

Time:12-07

I asked this question before but the answer was not provided as a function. I've tried to put in into a dunction but it didn't work so I'm asking again:)

So here is a sample CSV file that I have to analyze

1,8dac2b,ewmzr,jewelry,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3
2,668d39,aeqok,furniture,phone1,9759243157894736,jp,50.201.125.84,jmqlhflrzwuay9c
3,622r49,arqek,vehicle,phone2,9759544365415694736,az,53.001.135.54,weqlhrerreuert6f
4,6444t43,rrdwk,vehicle,phone9,9759543263245434353,au,54.241.234.64,weqqyqtqwrtert6f

and I'm tryna use this function def popvote(list) to return the most popular thing in the fourth row of each list in the list of csv which in the example above is vehicle.

Explanation down below

This is what I have so far

def popvote(list):
    for x in list:
        g = list(x)
        if x = max(g[x]):
           return x

However, this doesn't really work.. what should I change to make sure this works??

Note: The answer should be returned as a set

Explanation: So what I'm trying to return the value that is repeated most in the list based on what's indicated in (** xxxx **) below

1,8dac2b,ewmzr,**jewelry**,phone0,9759243157894736,us,69.166.231.58,vasstdc27m7nks3
2,668d39,aeqok,**furniture**,phone1,9759243157894736,jp,50.201.125.84,jmqlhflrzwuay9c
3,622r49,arqek,**vehicle**,phone2,9759544365415694736,az,53.001.135.54,weqlhrerreuert6f    
4,6444t43,rrdwk,**vehicle**,phone9,9759543263245434353,au,54.241.234.64,weqqyqtqwrtert6f

So in this case, vehicle should be the output.

CodePudding user response:

As pointed by the comment, you can use df.mode() and typecast to result to set.

df = pd.read_csv("filename.csv", header=None)
set(df[3].mode())

Out: {'vehicle'}

CodePudding user response:

Raw python approach, using collections.Counter:

import csv
from collections import Counter

def read_categories():
    with open("tmp.csv", "r") as f:
        reader = csv.reader(f)
        for row in reader:
            yield row[3]

counter = Counter(read_categories())
counter.most_common(n=1)
# [('vehicle', 2)]

Raw python only:

import csv

value_to_count = {}
with open("tmp.csv", "r") as f:
    reader = csv.reader(f)
    for row in reader:
        category = row[3]
        if category in value_to_count:
            value_to_count[category]  = 1
        else:
            value_to_count[category] = 1

# sorted list of counts and values
count_to_value = sorted((v, k) for k, v in value_to_count.items())

if count_to_value:
    print("most common", count_to_value[-1])
    # most common (2, 'vehicle')

If you find convtools useful, then:

from convtools import conversion as c
from convtools.contrib.tables import Table

rows = Table.from_csv("tmp.csv", header=False).into_iter_rows(tuple)

# this is where code generation happens, it makes sense to store
# the converter in a separate variable for further reuse
converter = c.aggregate(c.ReduceFuncs.Mode(c.item(3))).gen_converter()
converter(rows)
# "vehicle"

  • Related