I'm working on a dataset where a column is having numbers separated by commas. I want to convert the values into integer and obtain their mean value to replace with the current anomaly.
ex: 50,45,30,20 I want to get the mean value and replace it with current value
CodePudding user response:
You can simply define a function that unpack those values and then get the mean of those.
def get_mean(x):
#split into list of strings
splited = x.split(',')
#Transform into numbers
y = [float(n) for n in splited]
return sum(y)/len(y)
#Apply on desired column
df['col'] = df['col'].apply(get_mean)
CodePudding user response:
from numpy import mean
data.apply(lambda x: mean(list(map(lambda y: int(y.strip()), x.split(",")))))
CodePudding user response:
You can apply a custom function wike GabrielBoehme suggests, but if you are in control of the data import, handling the issue at the data import stage may be a bit cleaner.
import pandas as pd
data = pd.read_csv('foobar.csv', sep=',', thousands=',')
Obviously you are going to need to make sure everything is quoted appropriately so that the CSV is parsed correctly.
CodePudding user response:
Mine is a longer explanation and the others here are probably better... but this might be easier to understand if you are newer to python.
cell_num = "1,2,3,4,5,6,7"
#Splitting the numbers by , and making a list of them
cell_numbers = cell_num.split(",")
#Run loop to sum the values in the list
sum_num = 0
for num in cell_numbers:
sum_num = int(num)
#getting the mean
mean = int(sum_num) / len(cell_numbers)
#now printing your final number
print(mean)
If you have decimals... be sure to swap int with float.