If I have the following dataframe:
ID | other |
---|---|
219218 | 34 |
823#32 | 47 |
unknown | 42 |
8#3#32 | 32 |
1#3#5# | 97 |
6#3### | 27 |
I want to obtain the following result:
ID | other |
---|---|
219218 | 34 |
823#32 | 47 |
unknown | 42 |
8#3#32 | 32 |
unknown | 97 |
unknown | 27 |
I am using the following code which works.
for i in range(len(df)):
ident = testing.loc[i, 'ID']
if ident.count('#') > 2:
df.loc[i, 'ID'] = 'unknown'
Is there a way to make it more optimal, bearing in mind that I am going to apply the code to a dataframe of more than 60,000 rows?
Thank you for your help.
CodePudding user response:
For an efficient solution, use vectorial methods and assign with loc
:
df.loc[df['ID'].str.count('#').gt(2), 'ID'] = 'unknown'
output:
ID other
0 219218 34
1 823#32 47
2 unknown 42
3 8#3#32 32
4 unknown 97
5 unknown 27
CodePudding user response:
Personally speaking, I prefer apply
function on the dataframe:
def replaceRow(value):
if value.count("#") > 2:
return "unknown"
else:
return value
df["ID"] = df["ID"].apply(replaceRow)
df
Output
ID | other |
---|---|
219218 | 34 |
823#32 | 47 |
unknown | 42 |
8#3#32 | 32 |
unknown | 97 |
unknown | 27 |