Price |
---|
$ C145 |
$22.34 |
From USA |
Hi this is the df I have above I basically want to remove all the special characters ( $, and spaces). I then want to move them so they look like the table below as integers so I can process into a CSV and analyze the prices of over 50 rows.
Price | Shipping | Origin |
---|---|---|
145 | 22.34 | From USA |
I was thinking maybe I need to put it in a data frame then use a iloc to shift it over?
Apples=["$ 145", " 22.34", "From USA"]
df=pd.DataFrame({'Price': Apples})
new_df=pd.DataFrame({'Price':df['Price'].iloc[::2].values, 'Shipping':df2['Price'].iloc[1::2].values})
At this point it takes the price and shipping but it only shifts it into two columns i just need it to be done for 3. How can I make it go into the columns like in the new table above and also strip columns "Price" and "Shipping" with all of the strings maybe something like this but replicate it for columns Price and Shipping? Thank you for any help i'm just starting out in Pandas and python really!
new_df['Price']=new_df.Price.str.extract(r'(\d [.\d]*)')
CodePudding user response:
Changing the orientation from vertical to horizontal can be done with a transpose:
df = df.T
df.columns = ['price', 'shipping', 'origin']
Then you can treat each column to convert the strings to integer or float as needed:
import re # regular expressions
# Replace anything not a digit ('\D') with an empty string, then convert to
# int
df.price = df.price.apply(lambda x: int(re.sub(r'\D', '', x)))
# Replace anything not a digit or decimal ('[^0-9.]') with an empty string,
# then convert to float
df.shipping = df.shipping.apply(
lambda x: float(re.sub(r'[^0-9.]', '', x)))
# put it all in a single function:
def convert_df(df):
df = df.T
df.columns = ['price', 'shipping', 'origin']
df.price = df.price.apply(lambda x: int(re.sub(r'\D', '', x)))
df.shipping = df.shipping.apply(
lambda x: float(re.sub(r'[^0-9.]', '', x)))
return df
This will work even if your initial input has multiple columns like:
apples = ['$ C145', ' $22.34', 'From USA']
corn = ['$ C197', ' $18.46', 'From Canada']
df2 = pd.DataFrame({'apples': apples, 'corn': corn})
convert_df(df2)
CodePudding user response:
So your data looks like: price, shipping, origin, price, shipping, origin, price...
?
In that case you can extract every 3rd row with syntax like:
df.iloc[list(range(start, df.index.max(), 3)), :].reset_index(drop=True)
..then just concatenate them, and apply the number extraction as above.
# starting with, e.g.:
apples = ["$ C145", " $22.34", "From USA"]
df = pd.DataFrame({'data': apples * 30})
df.head(6)
data
0 $ C145
1 $22.34
2 From USA
3 $ C145
4 $22.34
5 From USA
...
df = pd.concat(
[
# Get every 3rd row and reindex
df.iloc[list(range(start, df.index.max(), 3)), :]
.reset_index(drop=True)
for start in [0, 1, 2]
],
axis=1)
df.columns = ['price', 'shipping', 'origin']
df.price = df.price.str.extract(r'(\d )').astype(int)
df.shipping = df.shipping.str.extract(r'(\d \.\d*)').astype(float)
df.head()
price shipping origin
0 145 22.34 From USA
1 145 22.34 From USA
2 145 22.34 From USA
3 145 22.34 From USA
4 145 22.34 From USA