Home > Back-end >  Is there a more efficient way to apply the custom function text slicer to the entire dataset?
Is there a more efficient way to apply the custom function text slicer to the entire dataset?

Time:12-16

I have a dataset that looks similar to this containing MAC Addresses:

dataset = {'Col1': ['10:50:1C:56:FF:C1', np.nan, '56:20:30:70:10:00'], 'Col2': [np.nan, 
'50:60:40:10:00:00', np.nan]}
dataframe = pd.DataFrame(data = dataset)

# Showing dataframe

    Col1               Col2
0   10:50:1C:56:FF:C1   NaN
1   NaN                50:60:40:10:00:00
2   56:20:30:70:10:00   NaN

I am looking to slice these addresses where it finds them into the first seven characters only, so the dataframe should look like this:

# Showing Sliced dataframe

    Col1               Col2
0   10:50:1C           NaN
1   NaN                50:60:40
2   56:20:30           NaN

Now I have written the below custom function that does the job successfully however it uses recursion, and I am looking for a method that can cut time and use less memory.

def sliceit(x):
  x = str(x)
  return x[:8]

def slice_macs(rowx):
  for i, item in enumerate(rowx):
     rowx[i] = sliceit(item)
  return rowx

I have also received wonderful responses from this community on a similar question I asked regarding slicing a different form of string , however, I tried looking into Regular Expressions and make alterations to the below so it can apply to these type of strings but I have had no luck.

IPs = splits.replace(r"(\d \.\d \.\d )\.\d ", r"\1", regex=True)

So my Question, is there a way to accomplish the above in a much more pythonic and faster way without using so much memory?

CodePudding user response:

So if I understand correctly you can use the pd.Series.str as a vectorized method to use string slicing on the series.

dataframe.Col1 = dataframe.Col1.str[:8]
dataframe.Col2 = dataframe.Col2.str[:8]
  • Related