So I am trying to add a new column to my dataframe that contains the side/radius given the shape and area of each row.
My original dataset looks like this:
df
:
shape color area
0 square yellow 9409.0
1 circle yellow 4071.5
2 triangle blue 2028.0
3 square blue 3025.0
But when I coded it like this:
df['side'] = 0
for x in df['shape']:
if x == 'square':
df['side'] = np.rint(np.sqrt(df['area'])).astype(int)
elif x == 'triangle':
df['side'] = np.rint(np.sqrt((4 * df['area'])/np.sqrt(3))).astype(int)
elif x == 'circle':
df['side'] = np.rint(np.sqrt(df['area']/np.pi)).astype(int)
I got:
shape color area size
0 square yellow 9409.0 55
1 circle yellow 4071.5 36
2 triangle blue 2028.0 25
3 square blue 3025.0 31
It looks like the loop is adding the elif x == 'circle'
clause to the side column for every row.
CodePudding user response:
Looks like it's a good use case for numpy.select
, where you select values depending on which shape it is:
import numpy as np
df['side'] = np.select([df['shape']=='square',
df['shape']=='circle',
df['shape']=='triangle'],
[np.rint(np.sqrt(df['area'])),
np.rint(np.sqrt(df['area']/np.pi)),
np.rint(np.sqrt((4 * df['area'])/np.sqrt(3)))],
np.nan).astype(int)
It could be written more concisely by creating a mapping from shape to multiplier; then use pandas vectorized operations:
mapping = {'square': 1, 'circle': 1 / np.pi, 'triangle': 4 / np.sqrt(3)}
df['side'] = df['shape'].map(mapping).mul(df['area']).pow(1/2).round(0).astype(int)
Output:
shape color area side
0 square yellow 9409.0 97
1 circle yellow 4071.5 36
2 triangle blue 2028.0 68
3 square blue 3025.0 55
CodePudding user response:
I see you were assigning to the columns. you can iterate over each row and edit it as you iterate over it using iterrows ()
method on dataFrame.
for i, row in df.iterrows():
if row['shape'] == 'square':
df.at[i,'side'] = np.rint(np.sqrt(row['area'])).astype(int)
elif row['shape'] == 'triangle':
df.at[i,'side'] = np.rint(np.sqrt((4 * row['area'])/np.sqrt(3))).astype(int)
elif row['shape'] == 'circle':
df.at[i,'side'] = np.rint(np.sqrt(row['area']/np.pi)).astype(int)
note the assignment is to cell of a column on row at index i
.
also, suggestion by @enke above will work just fine.