I have a large dataframe (df) and in the last column, all of the elements are showing up as
1055.0000.0
so the last 2 characters are always ".0". Whats the most efficient way to do this? the last columns name is always different so im not sure how to approach this. I have tried to loop over the pandas df but it takes too much memory and breaks the code. is there a way to do something like
df[ last column ] = df[ last column - last 2 characters]
or make a new df then append it in?
CodePudding user response:
Vectorized operations are almost always faster. .str
method allows pandas to vectorize strings
df["last_col"].str[:-2]
Can time it using %%timeit
magic command in jupyter notebook.
%%timeit
df.iloc[:, -1].str[-2:]
>>> 352 µs ± 4.68 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
df["last_col"].str[:-2]
>>> 242 µs ± 4.76 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
CodePudding user response:
You could also use rsplit
:
s = '105.0000.0'
s.rsplit('.0', 1)[0]
output:
105.0000
CodePudding user response:
Try with the str
accessor:
df.iloc[:, -1] = df.iloc[:, -1].astype(str).str[-2:].astype(int)