I have a dataset with price column as type of string ,and some of the values in the form of range (15000-20000). I want to extract the first number and convert the entire column to integers.
I tried this : df['ptice].apply(lambda x:x.split('-')[0]) The code just return the original column.
CodePudding user response:
Try one of the following options:
Data
import pandas as pd
data = {'price': ['0','100-200','200-300']}
df = pd.DataFrame(data)
print(df)
price
0 0 # adding a str without `-`, to show that this one will be included too
1 100-200
2 200-300
Option 1
- Use
Series.str.split
withexpand=True
and select the first column from the result. - Next, chain
Series.astype
, and assign the result todf['price']
to overwrite the original values.
df['price'] = df.price.str.split('-', expand=True)[0].astype(int)
print(df)
price
0 0
1 100
2 200
Option 2
- Use
Series.str.extract
with a regex pattern,r'(\d )-?'
: \d
matches a digit.- match stops when we hit
-
(?
specifies "if present at all").
data = {'price': ['0','100-200','200-300']}
df = pd.DataFrame(data)
df['price'] = df.price.str.extract(r'(\d )-?').astype(int)
# same result
CodePudding user response:
Here is one way to do this:
df['price'] = df['price'].str.split('-', expand=True)[0].astype('int')
This will only store first number from the range. Example: From 15000-20000 only 15000 will be stored in the price
column.