Home > Software design >  How to remove the last 2 characters of every element in a column of a pandas dataframe in python? [d
How to remove the last 2 characters of every element in a column of a pandas dataframe in python? [d

Time:10-07

I have a large dataframe (df) and in the last column, all of the elements are showing up as

1055.0000.0

so the last 2 characters are always ".0". Whats the most efficient way to do this? the last columns name is always different so im not sure how to approach this. I have tried to loop over the pandas df but it takes too much memory and breaks the code. is there a way to do something like

df[ last column ] = df[ last column - last 2 characters]

or make a new df then append it in?

CodePudding user response:

Vectorized operations are almost always faster. .str method allows pandas to vectorize strings

df["last_col"].str[:-2]

Can time it using %%timeit magic command in jupyter notebook.

%%timeit
df.iloc[:, -1].str[-2:]
>>> 352 µs ± 4.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
df["last_col"].str[:-2]
>>> 242 µs ± 4.76 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

CodePudding user response:

You could also use rsplit:

s = '105.0000.0'
s.rsplit('.0', 1)[0]

output:

105.0000

CodePudding user response:

Try with the str accessor:

df.iloc[:, -1] = df.iloc[:, -1].astype(str).str[-2:].astype(int)
  • Related