How can I find the average age for each row in this example. Each columns has its own age ranging from 0 to 90. I have read the csv with pandas.
CodePudding user response:
Assuming the values are the amount of people per age, here is one solution.
(Potentially not the fastest for large amount of data)
highest_age = 90
def row_age(x):
sum_age = 0
sum_people = 0
for i in range(0, highest_age 1):
people = x[f"age_{i}"]
sum_people = people
sum_age = people * i
if sum_people > 0:
return sum_age / sum_people
df["average_age"] = df.apply(row_age, axis=1)
CodePudding user response:
I Assume all the age-related columns are named like Age_0, Age_1, Age_2, ...
import pandas as pd
import numpy as np
df = pd.read_csv("./data.csv")
ages = np.array([int(c.split("_")[-1]) for c in df.filter(regex="^Age_\d $").columns])
df.loc[:, "avg_age"] = (df.filter(regex="^Age_\d $").values * ages).sum(axis=1) / df.filter(regex="^Age_\d $").values.sum(axis=1)