I read some data from a file. The first column is assigned 'object' type because of the XXX in the very first data row:
tips = pd.read_csv("tips.csv")
print(tips.head())
print(tips.info())
total_bill tip sex smoker day time size
0 xxx 1.01 Female No Sun Dinner 2
1 10.34 1.66 Male No Sun Dinner 3
2 21.01 3.50 Male No Sun Dinner 3
3 23.68 3.31 Male No Sun Dinner 2
4 24.59 3.61 Female No Sun Dinner 4
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 244 entries, 0 to 243
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 total_bill 244 non-null object
1 tip 244 non-null float64
2 sex 244 non-null object
3 smoker 244 non-null object
4 day 244 non-null object
5 time 244 non-null object
6 size 244 non-null int64
So, this will fail because of that one XXX in the first row of data where a number should be:
tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])
How do I rewrite the above line to filter out the bad row, without actually changing the contents of the DataFrame?
CodePudding user response:
You can wrap the column that has the 'xxx' in pd.to_numeric
using errors='coerce'
. This will convert string type values to NaN
so your operation can happen and your dataframe will be unchanged
tips['tip_pct'] = tips['tip'] / (pd.to_numeric(tips['total_bill'],errors='coerce') - tips['tip'])
total_bill tip sex smoker day time size Unnamed: 4 tip_pct
0 xxx 1.01 Female No Sun Dinner 2 NaN
1 10.34 1.66 Male No Sun Dinner 3 0.191244
2 21.01 3.50 Male No Sun Dinner 3 0.199886
3 23.68 3.31 Male No Sun Dinner 2 0.162494
4 24.59 3.61 Female No Sun Dinner 4 0.172069
CodePudding user response:
Another way, mask, coerce total_bill to float and compute
m=tips['total_bill']!='xxx'
tips['tip_pct'] =tips.loc[m,'tip'] / (tips.loc[m,'total_bill'].astype(float) - tips.loc[m,'tip'])
total_bill tip sex smoker day time size tip_pct
0 xxx 1.01 Female No Sun Dinner 2 NaN
1 10.34 1.66 Male No Sun Dinner 3 0.191244
2 21.01 3.50 Male No Sun Dinner 3 0.199886
3 23.68 3.31 Male No Sun Dinner 2 0.162494
4 24.59 3.61 Female No Sun Dinner 4 0.172069
CodePudding user response:
From read_csv
data = pd.read_csv('tips.csv',
dtype={'total_bil': np.float64})
tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])