I have a column of weights (there are actually 5k weights), a small batch looks like this:
weight
15.00 oz
19.00 oz
2 lb 15.00 oz
1 lb 19.00 oz
What I wanted to do is to convert the whole weight to pounds, something like this:
weight
0.9375 lb
1.1875 lb
2.9375 lb
2.1875 lb
What shall I do to achieve this? What I have tried so far is:
df[['lbs','oz']] = df.Weight.str.split("lb",expand=True)
but this doesn't work, as for rows with no 'lb' unit, the code doesn't work. that looked like this:
pounds ounces
15.00 oz
19.00 oz
2lb 15.00oz
1lb 19.00oz
CodePudding user response:
This works, but there's almost certainly a neater 'more pandas' way to do it... This should be fast enough to process 5,000 values.
Imports:
import pandas as pd
Test data set-up (including the data with oz values after the .):
df = pd.DataFrame(["15.00 oz",
"19.00 oz",
"2 lb 15.00 oz",
"1 lb 19.00 oz",
"1 lb 12.80 oz",
"1 lb",
"nothing"],
columns=["weight"])
Produces:
weight
0 15.00 oz
1 19.00 oz
2 2 lb 15.00 oz
3 1 lb 19.00 oz
4 1 lb 12.80 oz
5 1 lb
6 nothing
Define a function to map from the individual lb/oz values to a single lb value. This takes an array of tuples, which may be empty, such as: [(,'15.00')]
or []
or [('1', '12.80')]
(The 'numbers' in the matches are still of type str
at this point):
def lbsFromMatchedNumbers(matchData):
if len(matchData) == 0:
return None
(lbs, oz) = matchData[0]
lbs = float(lbs or 0)
oz = float(oz or 0)
ounces_in_pound = 16.0
return lbs (oz / ounces_in_pound)
Find all the items in the 'weight' row, and then process them with the function, and assign to new 'lb' column:
matchPattern = "^(?:(\d ) lb ?)?(?:(\d (?:.\d )?) oz)?$"
df["lb"] = df["weight"].str.findall(matchPattern).apply(lbsFromMatchedNumbers)
Produces:
weight lb
0 15.00 oz 0.9375
1 19.00 oz 1.1875
2 2 lb 15.00 oz 2.9375
3 1 lb 19.00 oz 2.1875
4 1 lb 12.80 oz 1.8000
5 1 lb 1.0000
6 nothing NaN
Note: This works if there are only lb or oz numbers, as shown in extra rows in the sample data I've used. If there's neither, it produces NaN
.
Explanation of the regex part
We're using a regex ('regular expression') to match the portions of the 'weight'
text content using this pattern:
"^(?:(\d ) lb ?)?(?:(\d (?:.\d )?) oz)?$"
Regex syntax used
- \d looks for a single 0-9 value
- \d looks for one or more 0-9 values (e.g., 1 or 435245)
?
(a , then a ?) looks for either a space, or nothing (the?
makes it optional)(hello)?
looks for 'hello', but continues anyway if it's not found (due to the?
- Brackets group items together
- Brackets with followed by
?:
,(?:like this)
, group items together, but don't save this as one of the 'matched groups'. In our example only the two numbers are returned, when they are matched (as they are enclosed in plain brackets)
Our particular example regex
Putting that together, this regex basically says:
- From the very start of the string (marked by
^
) - This part is optional:
- look for 1-or-more 0-9 digits - if you find this it is 'group 1'
- then a space
- then the text 'lb'
- then (optionally) a space
- Followed by (also optional):
- This whole part is 'group 2':
- 1-or-more 0-9 digits
- this bit is optional:
- a
.
, followed by 1-or-more 0-9 digits
- a
- then a space
- then the text 'oz'
- then the end of the string (marked by
$
)
- This whole part is 'group 2':
CodePudding user response:
weights = [15, 19] #different weights here
for i in weights:
weight = i / 16
print(weight)
Try this, it should be working and it's a far simpler was of doing it that what you showed. Hope it works for you!