How to convert comma separated numbers from a dataframe to to numbers and get the avg value-CodePudding

I'm working on a dataset where a column is having numbers separated by commas. I want to convert the values into integer and obtain their mean value to replace with the current anomaly.

ex: 50,45,30,20 I want to get the mean value and replace it with current value

CodePudding user response：

You can simply define a function that unpack those values and then get the mean of those.

def get_mean(x):
    #split into list of strings
    splited = x.split(',')
    #Transform into numbers
    y = [float(n) for n in splited]
    return sum(y)/len(y)

#Apply on desired column
df['col'] = df['col'].apply(get_mean)

CodePudding user response：

from numpy import mean
data.apply(lambda x: mean(list(map(lambda y: int(y.strip()), x.split(",")))))

CodePudding user response：

You can apply a custom function wike GabrielBoehme suggests, but if you are in control of the data import, handling the issue at the data import stage may be a bit cleaner.

import pandas as pd
data = pd.read_csv('foobar.csv', sep=',', thousands=',')

Obviously you are going to need to make sure everything is quoted appropriately so that the CSV is parsed correctly.

CodePudding user response：

Mine is a longer explanation and the others here are probably better... but this might be easier to understand if you are newer to python.

cell_num = "1,2,3,4,5,6,7"

#Splitting the numbers by , and making a list of them
cell_numbers = cell_num.split(",")

#Run loop to sum the values in the list
sum_num = 0
for num in cell_numbers:
    sum_num  = int(num)

#getting the mean
mean = int(sum_num) / len(cell_numbers)

#now printing your final number
print(mean)

If you have decimals... be sure to swap int with float.