Home > Back-end >  pandas replace all nan values of a column with regex on another column
pandas replace all nan values of a column with regex on another column

Time:10-27

So i have this dataset below which has some nan values on "a" column. I need to replace only the nan values of column "a" applying a regex on rows of column b and count the number of hashtags on its values. I need to do it inplace since I have a very big dataset.

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': [0, np.nan, np.nan], 'b': ["#hello world", "#hello #world", "hello #world"]})

print(df)

the result should be

df = pd.DataFrame({'a': [0, 2, 1], 'b': ["#hello world", "#hello #world", "hello #world"]})        
print(df)

I have already the regex method

regex_hashtag = "#[a-zA-Z0-9_] "
num_hashtags = len(re.findall(regex_hashtag, text))

how can I do it?

CodePudding user response:

Use str.count:

regex_hashtag = "#[a-zA-Z0-9_] " # or '#\w '

m = df['a'].isna()

df.loc[m, 'a'] = df.loc[m, 'b'].str.count(regex_hashtag)

output:

   a              b
0  0   #hello world
1  2  #hello #world
2  1   hello #world
  • Related