I have a dataframe in python like this:
data = [['a_subj.163', 1], ['b_subj.164', 2], ['c_subj.165', 3]]
df = pd.DataFrame(data, columns = ['subj', 'mean'])
subj mean
0 a_subj.163 1
1 b_subj.164 2
2 c_subj.165 3
I need to take the mean where the the subj starts with 'a.subj' and add it to a new variable called mean_a
.
I've tried the following but get a TypeError: 'DataFrame' object is not callable
:
df['mean_a'] = np.where(df(subj.startswith("a_subj")), mean, '')
I've also tried this, I don't get an error but the new variable isn't created:
for subj in df:
if subj.startswith('a_subj'):
df['mean_a'] = mean
Any suggestions on where I'm going wrong?
CodePudding user response:
You say you want it in a "new variable" but your code seems to be trying to put the mean into a new column. If your goal is to get it into a variable try:
mean_a = df['mean'][df.subj.str.startswith('a_subj')].mean()
CodePudding user response:
I know there are better answers but if you wanted to use the for loop this is how you would do it:
df["mean_a"] = "" # remove this line if you want nan in the rest of the values
for i, row in df.iterrows():
if row.subj.startswith('a_subj'):
df.at[i, 'mean_a'] = row["mean"]
CodePudding user response:
Here you are calling the DataFrame instead of accessing it
np.where(df(subj.startswith("a_subj")), mean, '')
For accessing you need to use square brackets:
np.where(df[subj.startswith("a_subj")], mean, '')