I have a dataset that looks like this with IP addresses (for security's sake, these are all made up):
0 | 1 | 2 |
---|---|---|
100.0.200.0 | 160.60.30.0 | NaN |
NaN | 101.60.10.0 | 10.0.0.1 |
I want to apply a function that would take these IP addresses (where they exist) and essentially return a sliced version of them by removing the fourth octet so it should look like this:
0 | 1 | 2 |
---|---|---|
100.0.200 | 160.60.30 | NaN |
NaN | 101.60.10 | 10.0.0 |
I have written the below code that does the job but it is very slow since it uses recursion and I want to be able to do this faster.
def sliceip(row):
row = str(row)
return row.rsplit(".",1)[0]
def applysliceip(rowx):
for i, item in enumerate(rowx):
rowx[i] = sliceip(item)
return rowx
# And I apply this to the entire dataframe as such:
split_IPs = IPs.apply(lambda row: applysliceip(row))
So my Question is there a more pythonic and faster way to accomplish the above and return the same output without having to use so much memory?
CodePudding user response:
You can use a regular expression to match and replace instead of using a custom function.
IPs.replace(r"(\d \.\d \.\d )\.\d ", r"\1", regex=True)
CodePudding user response:
A possible solution, which uses pandas.DataFrame.applymap
and regex
to replace the last .
and digits
by empty string:
import re
df.applymap(lambda x: re.sub(r'\.\d $', '', x))
Output:
0 1 2
0 100.0.200 160.60.30 NaN
1 NaN 101.60.10 10.0.0
A faster solution, based on numpy
:
import re
v = np.vectorize(lambda x: re.sub(r'\.\d $', '', x))
pd.DataFrame(np.where(pd.notnull(df), v(df), df))