How iterate in a efficient way over Pandas dataframe with Numpy.vectorize?-CodePudding

I'm trying to iterate over a Pandas Dataframe using each row as a parameter function. I tried this:

def vectorize_df(df, hg):
   print(hg   str(df['tweets_id'])   df['tokenized_text'])

df = pd.DataFrame.from_records(belongs_node, columns=['tweets_id','tokenized_text'])
vfunct = numpy.vectorize(vectorize_df)
vfunct(df, "#Python")

The problem is when I do that, df parameter takes the value from 'tweets_id' instead of the all row. Thanks a lot :)

CodePudding user response：

When you define a function to be vectorized, then:

each column should be a separate parameter,
you should call it passing corresponding columns,
"other" parameters (not taken from the source array), should be marked as "excluded" parameters.

Another detail is that a vectorized function should not print anything, but it should return some value - the result of processing parameters from the current source row.

So you could e.g. proceed as follows

Define your function as:
```
def myFunct(col1, col2, hg):
    return f'{hg} / {col1} / {col2}'
```
Don't use the word vectorize in the name of the function. For now it is an "ordinary" function. It will be vectorized in a moment.

Create the vectorized function:

vfunct = np.vectorize(myFunct, excluded=['hg'])

And finally call it:

vfunct(df.tweets_id, df.tokenized_text, '#Python')

The result, for my sample data, is:

array(['#Python / 101 / aaa bbb ccc ddd',
       '#Python / 102 / eee fff ggg hhh iii jjj',
       '#Python / 103 / kkk lll mmm nnn ooo ppp'], dtype='<U39')

It is up to what you do with this result. You may e.g. set it as a new column of your source DataFrame.