Home > Software engineering >  df.replace with regex throws error when debugging
df.replace with regex throws error when debugging

Time:08-19

I have a dataframe(1000,3). I am interested for one of the columns because it contains strings but they have this part which I want to remove --> "\n" . For example

df

A        B          C
1        d\n        aa
2        \nc        gg
3        m\nm       hh 

I want the outcome to be

df

A        B          C
1        d         aa
2        c         gg
3        mm        hh 

I have tried the following

df['B'] = df['B'].replace('\n', '')
and 
df['B'] = df['B'].str.replace(r'\n', '', regex=False) 

I put a breaking point after it to inspect the outcome but nothing changes for both methods. Then I tried

df['B'] = df['B'].replace('\n', '', regex=True)

however when I have the breaking point I get the following error

  File "path\pandas\core\internals\managers.py", line 304, in apply
    applied = getattr(b, f)(**kwargs)
  File "path\pandas\core\internals\blocks.py", line 761, in _replace_regex
    replace_regex(new_values, rx, value, mask)
  File "path\pandas\core\array_algos\replace.py", line 153, in replace_regex
    f = np.vectorize(re_replacer, otypes=[np.object_])
  File "path\numpy\lib\function_base.py", line 2261, in __init__
    otypes = ''.join([_nx.dtype(x).char for x in otypes])
  File "path\numpy\lib\function_base.py", line 2261, in <listcomp>
    otypes = ''.join([_nx.dtype(x).char for x in otypes])
TypeError: 'NoneType' object is not callable
(I replaced the path with the word "path")

but the code runs. Of course I can run the code and save the outcome to a csv file and check from there, but I do not understand the problem

CodePudding user response:

I suspect you don't have newlines, but rather literal \n.

You should try:

df['B'] = df['B'].str.replace(r'\n', '', regex=False)

CodePudding user response:

just a quick update: Based on your feedback I found the solution. It was as simple as

    df['B'] = df['B'].str.replace('\n', ' ', regex= False)

Thank you very much! Just posted it as an answer in case someone faces the same issue in the future!

CodePudding user response:

In [21]: d = {'A': [1, 2,3 ], 'B': ['d\n', '\nc', 'm\nm']}

In [22]: df = pd.DataFrame(data=d)

In [23]: df
Out[23]:
   A     B
0  1   d\n
1  2   \nc
2  3  m\nm

In [24]: df['B'].replace('\n', '', regex=True)
Out[24]:
0     d
1     c
2    mm
Name: B, dtype: object

In [25]:
  • Related