Creating a New Column in a Pandas Dataframe in a more pythonic way-CodePudding

I am trying to find a better, more pythonic way of accomplishing the following:

I want to add a new column to business_df called 'dot_prod', which is the dot product of a fixed vector (fixed_vector) and a vector from another data frame (rating_df). The rows of both business_df and rating_df have the same index values (business_id).

I have this loop which appears to work, however I know it's super clumsy (and takes forever). Essentially it loops through once for every row, calculates the dot product, then dumps it into the business_df dataframe.

n=0
for i in range(business_df.shape[0]):
        dot_prod = np.dot(fixed_vector, rating_df.iloc[n])
        business_df['dot_prod'][n] = dot_prod
        n =1

CodePudding user response：

>>> fixed_vector = [1, 2, 3]
>>> df = pd.DataFrame({'col1' : [1,2], 'col2' : [3,4], 'col3' : [5,6]})
>>> df
   col1  col2  col3
0     1     3     5
1     2     4     6
>>> df['col4'] = np.dot(fixed_vector, [df['col1'], df['col2'], df['col3']])
>>> df
   col1  col2  col3  col4
0     1     3     5    22
1     2     4     6    28

Hope it helps you.

CodePudding user response：

IIUC, you are looking for apply across axis=1 like:

business_df['dot_prod'] = rating_df.apply(lambda x: np.dot(fixed_vector, x), axis=1)