I have a Pandas dataframe which look like this.
Customer ID Customer Name Price_Val
5015 AXN 17.12
5015 AXN 2.08
5015 AXN 3.453
7315 BXN 2.22
7315 BXN 8.46567
3283 CXN 88
3283 CXN 0.4600
3283 CXN 6.46
3283 CXN
I want to create column named dec_value. I want dec_value column to have length of decimal places from the corresponding Price_Val column.
For example I want my dec_value column should look like this.
Customer ID Customer Name Price_Val dec_value
5015 AXN 17.12 2
5015 AXN 2.08 2
5015 AXN 3.453 3
7315 BXN 2.22 2
7315 BXN 8.4656 4
3283 CXN 88 0
3283 CXN 0.4600 4
3283 CXN 6.46 2
3283 CXN 0
I am using below code to do the above work.
i = 0
for value in df1['Price_Val']:
if value == '':
df1.loc[i, "dec_value "] = 0
else:
colval = value
k = str(colval)[::-1].find('.')
if k == -1:
df1.loc[i,"dec_value"] = 0
else:
df1.loc[i,"dec_value"] = str(colval)[::-1].find('.')
i=i 1
What's the most efficient way to do this?
CodePudding user response:
Convert your column to string, split
on dot, rstrip
zeros, and count the characters:
df['Price_Val'].fillna('').apply(lambda x: len(str(x).split('.')[-1].rstrip('0')))
or
df['dec_value'] = (df['Price_Val'].fillna('').astype(str)
.str.split('.').str[-1]
.str.rstrip('0').str.len()
)
output:
Customer ID Customer Name Price_Val dec_value
0 5015 AXN 17.12000 2
1 5015 AXN 2.08000 2
2 5015 AXN 3.45300 3
3 7315 BXN 2.22000 2
4 7315 BXN 8.46567 5
5 3283 CXN 88.00000 0
6 3283 CXN 0.46000 2
7 3283 CXN 6.46000 2
8 3283 CXN NaN 0
Alternatively, using a regex:
df['dec_value'] = (df['Price_Val'].fillna('').astype(str)
.str.extract('\.(\d*[1-9])', expand=False)
.str.len().fillna(0, downcast='infer')
)
timing of alternatives (90k rows)
# apply
50.5 ms ± 913 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# regex
83.9 ms ± 323 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
# str pipeline
115 ms ± 2.39 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)