I use python 3.8 with pandas. I have some data in csv file. I get data into pandas dataframe and try to delete some parts of Client_Name strings. Some Client_Name s are like Client_Name = myserv(4234) so i have to clean (4234) to make Client_Name = myserv. Shortly, i have to clean the paranthesis (4234) from Client_Name s.
df.Client_Name[df.Client_Name.str.contains('\(')] = df.Client_Name.str.split("(")[0]
I wrote above code and it is cleaning the paranthesis from Client_Name s. My problem is if the (4234) is at the first row of dataframe it gives error. If (4234) is at other rows there is no problem.
The working data is :
,time,Client_Name,Minutes<br>
0,2018-10-14T21:01:00Z,myserv1,5<br>
1,2018-10-14T21:01:00Z,myserv2,5<br>
2,2018-10-14T21:01:00Z,myserv3(4234),6<br>
3,2018-10-14T21:01:00Z,myserv4,6<br>
4,2018-10-14T21:02:07Z,myserv5(4234),3<br>
5,2018-10-14T21:02:29Z,myserv6(4234),3<br>
When i run my code it deletes the (4234) s and data turn into below format :
,time,Client_Name,Minutes<br>
0,2018-10-14T21:01:00Z,myserv1,5<br>
1,2018-10-14T21:01:00Z,myserv2,5<br>
2,2018-10-14T21:01:00Z,myserv3,6<br>
3,2018-10-14T21:01:00Z,myserv4,6<br>
4,2018-10-14T21:02:07Z,myserv5,3<br>
5,2018-10-14T21:02:29Z,myserv6,3<br>
But if the (4234) is on the first row like below, my code throws error :
,time,Client_Name,Minutes<br>
0,2018-10-14T21:01:00Z,myserv1(4234),5<br>
1,2018-10-14T21:01:00Z,myserv2,5<br>
2,2018-10-14T21:01:00Z,myserv3,6<br>
3,2018-10-14T21:01:00Z,myserv4,6<br>
4,2018-10-14T21:02:07Z,myserv5,3<br>
5,2018-10-14T21:02:29Z,myserv6,3<br>
The error is :
test1.py:97: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df.Client_Name[df.Client_Name.str.contains('\(')] = df.Client_Name.str.split("(")[0]
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/pandas/core/series.py", line 972, in __setitem__
self._set_with_engine(key, value)
File "/usr/local/lib/python3.8/dist-packages/pandas/core/series.py", line 1005, in _set_with_engine
loc = self.index._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 75, in pandas._libs.index.IndexEngine.get_loc
TypeError: '0 True
1 False
2 False
3 False
4 False
...
116 False
117 False
118 False
119 False
120 False
Name: Client_Name, Length: 121, dtype: bool' is an invalid key
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "test1.py", line 97, in <module>
df.Client_Name[df.Client_Name.str.contains('\(')] = df.Client_Name.str.split("(")[0]
File "/usr/local/lib/python3.8/dist-packages/pandas/core/series.py", line 992, in __setitem__
self._where(~key, value, inplace=True)
File "/usr/local/lib/python3.8/dist-packages/pandas/core/generic.py", line 9129, in _where
new_data = self._mgr.putmask(
File "/usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py", line 579, in putmask
return self.apply(
File "/usr/local/lib/python3.8/dist-packages/pandas/core/internals/managers.py", line 427, in apply
applied = getattr(b, f)(**kwargs)
File "/usr/local/lib/python3.8/dist-packages/pandas/core/internals/blocks.py", line 1144, in putmask
raise ValueError("cannot assign mismatch length to masked array")
ValueError: cannot assign mismatch length to masked array
CodePudding user response:
Your slicing method generates a copy, which you modify, this is giving the warning.
You could use instead:
df['Client_Name'] = df['Client_Name'].str.replace('\(.*?\)', '', regex=True)
output:
time Client_Name Minutes
0 2018-10-14T21:01:00Z myserv1 5
1 2018-10-14T21:01:00Z myserv2 5
2 2018-10-14T21:01:00Z myserv3 6
3 2018-10-14T21:01:00Z myserv4 6
4 2018-10-14T21:02:07Z myserv5 3
5 2018-10-14T21:02:29Z myserv6 3