I have a Pandas dataframe which look like this.
CustId CustName Price_1 Priceqty1 Price_2 Priceqty2 Price_3 Priceqty3 Price_4 Price_5
5015 Axn 315.12 1 374 1 126.32 3 167.8765
5015 Axn 75.36 3 190.19 7 33.16 1 190.19 88
5015 Axn 123 5 4 18.07 2 0.073 12
7315 Bxy 12.0 4 22.345 3 77.89 1 345.0 3344
7315 Bxy 987.90 7 34.06 4 90.09 3 876.34 908.76
3283 Cxz 123.34 8 55.78 7 12 9 878.09 98.456
3283 Cxz 178.90 7 88 8 0.09 0 987.56
I have 5 different price columns with different values with different decimal places. For better understanding to end user I need to convert all the price columns values to a common decimal place.
To convert I need to follow certain Rule: 1.conversion should be done for every customer id separately. 2.Common decimal place will be determined by a price column value with largest decimal place.
Let's Take CustId 5015 example
CustId CustName Price_1 Priceqty1 Price_2 Priceqty2 Price_3 Priceqty3 Price_4 Price_5
5015 Axn 315.12 1 374 1 126.32 3 167.8765
5015 Axn 75.36 3 190.19 7 33.16 1 190.19 88
5015 Axn 123 5 4 18.07 2 0.073 12
If you see you can find 1st row of Price_4 column Has value of 167.8765 here the decimal place is of 4 . If you check all the price column values for Custid 5015 the decimal place of 4 is the largest one when compared with other price columns values. since 4 is largest one I need to convert every price column values of custid 5015 to 4 decimal places.
After conversion This how it should look like.
CustId CustName Price_1 Priceqty1 Price_2 Priceqty2 Price_3 Priceqty3 Price_4 Price_5
5015 Axn 315.1200 1 374.0000 1 126.3200 3 167.8765
5015 Axn 75.3600 3 190.1900 7 33.1600 1 190.1900 88
5015 Axn 123.0000 5 4 18.0700 2 0.0730 12
Blank values should be Blank only.
I need to do this for every custid there are more than 800 different customer id. What's the most efficient way to do this?
CodePudding user response:
You could start by computing the integer and decimal part length of the numbers and take the max:
lengths = (df.filter(like='Price_')
.stack().astype(str).dropna()
.str.split('.', expand=True).astype(str)
.apply(lambda c: c.str.len())
.max()
)
output:
0 4 # this is the integer part
1 4 # this is the decimal part
Then, if you do not want to change the data but only print the dataframe:
integer, decimal = lengths.values
total = integer decimal 1
custom_format = '{:%s.%sf}' % (total,decimal)
pd.options.display.float_format = custom_format.format
print(df.fillna(''))
output:
CustId CustName Price_1 Priceqty1 Price_2 Priceqty2 Price_3 Priceqty3 Price_4 Price_5
0 5015 Axn 315.1200 1 374.0000 1.0000 126.3200 3.0000 167.8765
1 5015 Axn 75.3600 3 190.1900 7.0000 33.1600 1.0000 190.1900 88.0000
2 5015 Axn 123.0000 5 4.0000 18.0700 2.0000 0.0730 12.0000
3 7315 Bxy 12.0000 4 22.3450 3.0000 77.8900 1.0000 345.0000 3344.0000
4 7315 Bxy 987.9000 7 34.0600 4.0000 90.0900 3.0000 876.3400 908.7600
5 3283 Cxz 123.3400 8 55.7800 7.0000 12.0000 9.0000 878.0900 98.4560
6 3283 Cxz 178.9000 7 88.0000 8.0000 0.0900 0.0000 987.5600
CodePudding user response:
df = df.fillna('')
m = df.filter(regex='Price_').astype(str).applymap(lambda x: len(x.split('.')[1]) if x else 0).max().max()
pd.options.display.float_format = ('{:,.' str(m) 'f}').format # based on https://stackoverflow.com/a/20937592/15035314
print(df)
Prints:
CustId CustName Price_1 Priceqty1 ... Price_3 Priceqty3 Price_4 Price_5
0 5015 Axn 315.1200 1 ... 126.3200 3 167.8764
1 5015 Axn 75.3600 3 ... 33.1600 1 190.1900 88.0000
2 5015 Axn 123.0000 5 ... 18.0700 2 0.0730 12.0000
3 7315 Bxy 12.0000 4 ... 77.8900 1 345.0000 3,344.0000
4 7315 Bxy 987.9000 7 ... 90.0900 3 876.3400 908.7600
5 3283 Cxz 123.3400 8 ... 12.0000 9 878.0900 98.4560
6 3283 Cxz 178.9000 7 ... 0.0900 0 987.5600
[7 rows x 10 columns]