I have data in the following format. I want to remove strings that are appearing in 'TIMES_Sold'
column and replace it with 0 and then convert the column to integers.
Similarly, remove 'each' from the 'ITEM_Price_£'
column and convert it to float. How can I be able to do that?
TIMES_Sold ITEM_Price_£
13 14.99
0 6.95 each
0 10.95 each
56 8.75
0 8.50 each
979 3.25
0 20.08
4 8.82
Portable Gas Sniffer 9
2 15.46
Output should look like this:
TIMES_Sold ITEM_Price_£
13 14.99
0 6.95
0 10.95
56 8.75
0 8.50
979 3.25
0 20.08
4 8.82
0 9
2 15.46
CodePudding user response:
You can use Series.str.replace
with the following regex patterns
df['TIMES_Sold'] = df['TIMES_Sold'].str.replace('\D', '0', regex=True).astype(int)
df['ITEM_Price_£'] = df['ITEM_Price_£'].str.replace('[^\d.] ', '', regex=True).astype(float)
Output
>>> df
TIMES_Sold ITEM_Price_£
0 13 14.99
1 0 6.95
2 0 10.95
3 56 8.75
4 0 8.50
5 979 3.25
6 0 20.08
7 4 8.82
8 0 9.00
9 2 15.46
\D
- Matches anything other than a digit;[^\d.]
- Matches anything other than a digit or the literal.
as many times as possible (although the
CodePudding user response:
df[["TIMES_Sold", "ITEM_Price_£"]] = df[["TIMES_Sold", "ITEM_Price_£"]].astype(str).apply(lambda col: pd.to_numeric(col.str.replace(r"[^\d\.] ", "0", regex=True)))
df.TIMES_Sold = df.TIMES_Sold.astype(int)
df
CodePudding user response:
this is one way to do it
df['TIMES_Sold'] = df['TIMES_Sold'].str.strip().replace(r'\D\s','0', regex=True).astype(int)
df['ITEM_Price_£'] = df['ITEM_Price_£'].astype(str).str.extract(r'([0-9\.\,]*)')
df
TIMES_Sold ITEM_Price_£
0 13 14.99
1 0 6.95
2 0 10.95
3 56 8.75
4 0 8.50
5 979 3.25
6 0 20.08
7 4 8.82
8 0 9
9 2 15.46
CodePudding user response:
How about this? Good luck. BTW, your output is integer wheareas it says float in your question.
import string
alphabets=[string.ascii_lowercase string.ascii_uppercase]
#print(alphabets)
df["TIMES_Sold"]=(df["TIMES_Sold"].where(~df["TIMES_Sold"].str.contains('[alphabets]'),'0')).astype(float)
df