I have a dataset that looks similar to this containing MAC Addresses:
dataset = {'Col1': ['10:50:1C:56:FF:C1', np.nan, '56:20:30:70:10:00'], 'Col2': [np.nan,
'50:60:40:10:00:00', np.nan]}
dataframe = pd.DataFrame(data = dataset)
# Showing dataframe
Col1 Col2
0 10:50:1C:56:FF:C1 NaN
1 NaN 50:60:40:10:00:00
2 56:20:30:70:10:00 NaN
I am looking to slice these addresses where it finds them into the first seven characters only, so the dataframe should look like this:
# Showing Sliced dataframe
Col1 Col2
0 10:50:1C NaN
1 NaN 50:60:40
2 56:20:30 NaN
Now I have written the below custom function that does the job successfully however it uses recursion, and I am looking for a method that can cut time and use less memory.
def sliceit(x):
x = str(x)
return x[:8]
def slice_macs(rowx):
for i, item in enumerate(rowx):
rowx[i] = sliceit(item)
return rowx
I have also received wonderful responses from this community on a similar question I asked regarding slicing a different form of string , however, I tried looking into Regular Expressions and make alterations to the below so it can apply to these type of strings but I have had no luck.
IPs = splits.replace(r"(\d \.\d \.\d )\.\d ", r"\1", regex=True)
So my Question, is there a way to accomplish the above in a much more pythonic and faster way without using so much memory?
CodePudding user response:
So if I understand correctly you can use the pd.Series.str as a vectorized method to use string slicing on the series.
dataframe.Col1 = dataframe.Col1.str[:8]
dataframe.Col2 = dataframe.Col2.str[:8]