I'm trying to iterate over a Pandas Dataframe using each row as a parameter function. I tried this:
def vectorize_df(df, hg):
print(hg str(df['tweets_id']) df['tokenized_text'])
df = pd.DataFrame.from_records(belongs_node, columns=['tweets_id','tokenized_text'])
vfunct = numpy.vectorize(vectorize_df)
vfunct(df, "#Python")
The problem is when I do that, df parameter takes the value from 'tweets_id' instead of the all row. Thanks a lot :)
CodePudding user response:
When you define a function to be vectorized, then:
- each column should be a separate parameter,
- you should call it passing corresponding columns,
- "other" parameters (not taken from the source array), should be marked as "excluded" parameters.
Another detail is that a vectorized function should not print anything, but it should return some value - the result of processing parameters from the current source row.
So you could e.g. proceed as follows
Define your function as:
def myFunct(col1, col2, hg): return f'{hg} / {col1} / {col2}'
Don't use the word vectorize in the name of the function. For now it is an "ordinary" function. It will be vectorized in a moment.
Create the vectorized function:
vfunct = np.vectorize(myFunct, excluded=['hg'])
And finally call it:
vfunct(df.tweets_id, df.tokenized_text, '#Python')
The result, for my sample data, is:
array(['#Python / 101 / aaa bbb ccc ddd',
'#Python / 102 / eee fff ggg hhh iii jjj',
'#Python / 103 / kkk lll mmm nnn ooo ppp'], dtype='<U39')
It is up to what you do with this result. You may e.g. set it as a new column of your source DataFrame.