Home > other >  How to create a new variable in a dataframe using .startswith?
How to create a new variable in a dataframe using .startswith?

Time:11-02

I have a dataframe in python like this:

data = [['a_subj.163', 1], ['b_subj.164', 2], ['c_subj.165', 3]]
df = pd.DataFrame(data, columns = ['subj', 'mean'])
    subj       mean
0   a_subj.163  1
1   b_subj.164  2
2   c_subj.165  3

I need to take the mean where the the subj starts with 'a.subj' and add it to a new variable called mean_a.

I've tried the following but get a TypeError: 'DataFrame' object is not callable:

df['mean_a'] = np.where(df(subj.startswith("a_subj")), mean, '')

I've also tried this, I don't get an error but the new variable isn't created:

for subj in df:
    if subj.startswith('a_subj'):
        df['mean_a'] = mean

Any suggestions on where I'm going wrong?

CodePudding user response:

You say you want it in a "new variable" but your code seems to be trying to put the mean into a new column. If your goal is to get it into a variable try:

mean_a = df['mean'][df.subj.str.startswith('a_subj')].mean()

CodePudding user response:

I know there are better answers but if you wanted to use the for loop this is how you would do it:

df["mean_a"] = "" # remove this line if you want nan in the rest of the values
for i, row in df.iterrows():
    if row.subj.startswith('a_subj'):
        df.at[i, 'mean_a'] = row["mean"]

CodePudding user response:

Here you are calling the DataFrame instead of accessing it

np.where(df(subj.startswith("a_subj")), mean, '')

For accessing you need to use square brackets:

np.where(df[subj.startswith("a_subj")], mean, '')
  • Related