Home > Software design >  I cannot trim a string in a dataframe, if the string is in the first record of dataframe
I cannot trim a string in a dataframe, if the string is in the first record of dataframe

Time:12-25

I use python 3.8 with pandas. I have some data in csv file. I get data into pandas dataframe and try to delete some parts of Client_Name strings. Some Client_Name s are like Client_Name = myserv(4234) so i have to clean (4234) to make Client_Name = myserv. Shortly, i have to clean the paranthesis (4234) from Client_Name s.

df.Client_Name[df.Client_Name.str.contains('\(')] = df.Client_Name.str.split("(")[0]

I wrote above code and it is cleaning the paranthesis from Client_Name s. My problem is if the (4234) is at the first row of dataframe it gives error. If (4234) is at other rows there is no problem.

The working data is :

,time,Client_Name,Minutes<br>
0,2018-10-14T21:01:00Z,myserv1,5<br>
1,2018-10-14T21:01:00Z,myserv2,5<br>
2,2018-10-14T21:01:00Z,myserv3(4234),6<br>
3,2018-10-14T21:01:00Z,myserv4,6<br>
4,2018-10-14T21:02:07Z,myserv5(4234),3<br>
5,2018-10-14T21:02:29Z,myserv6(4234),3<br>

When i run my code it deletes the (4234) s and data turn into below format :

,time,Client_Name,Minutes<br>
0,2018-10-14T21:01:00Z,myserv1,5<br>
1,2018-10-14T21:01:00Z,myserv2,5<br>
2,2018-10-14T21:01:00Z,myserv3,6<br>
3,2018-10-14T21:01:00Z,myserv4,6<br>
4,2018-10-14T21:02:07Z,myserv5,3<br>
5,2018-10-14T21:02:29Z,myserv6,3<br>

But if the (4234) is on the first row like below, my code throws error :

,time,Client_Name,Minutes<br>
0,2018-10-14T21:01:00Z,myserv1(4234),5<br>
1,2018-10-14T21:01:00Z,myserv2,5<br>
2,2018-10-14T21:01:00Z,myserv3,6<br>
3,2018-10-14T21:01:00Z,myserv4,6<br>
4,2018-10-14T21:02:07Z,myserv5,3<br>
5,2018-10-14T21:02:29Z,myserv6,3<br>

The error is :

test1.py:97: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.Client_Name[df.Client_Name.str.contains('\(')] = df.Client_Name.str.split("(")[0]
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/series.py", line 972, in __setitem__
    self._set_with_engine(key, value)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/series.py", line 1005, in _set_with_engine
    loc = self.index._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 75, in pandas._libs.index.IndexEngine.get_loc
TypeError: '0       True
1      False
2      False
3      False
4      False
       ...
116    False
117    False
118    False
119    False
120    False
Name: Client_Name, Length: 121, dtype: bool' is an invalid key

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test1.py", line 97, in <module>
    df.Client_Name[df.Client_Name.str.contains('\(')] = df.Client_Name.str.split("(")[0]
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/series.py", line 992, in __setitem__
    self._where(~key, value, inplace=True)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py", line 9129, in _where
    new_data = self._mgr.putmask(
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py", line 579, in putmask
    return self.apply(
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py", line 427, in apply
    applied = getattr(b, f)(**kwargs)
  File "/usr/local/lib/python3.8/dist-packages/pandas/core/internals/blocks.py", line 1144, in putmask
    raise ValueError("cannot assign mismatch length to masked array")
ValueError: cannot assign mismatch length to masked array

CodePudding user response:

Your slicing method generates a copy, which you modify, this is giving the warning.

You could use instead:

df['Client_Name'] = df['Client_Name'].str.replace('\(.*?\)', '', regex=True)

output:

                  time Client_Name  Minutes
0 2018-10-14T21:01:00Z     myserv1        5
1 2018-10-14T21:01:00Z     myserv2        5
2 2018-10-14T21:01:00Z     myserv3        6
3 2018-10-14T21:01:00Z     myserv4        6
4 2018-10-14T21:02:07Z     myserv5        3
5 2018-10-14T21:02:29Z     myserv6        3

  • Related