I am a python.new who need some help in the following question:
I got a dataframe like this.
df:
index | height | unit |
---|---|---|
0 | 181.5 | cm |
1 | 72.5 | inches |
2 | 168.0 | cm |
3 | NaN | NaN |
.. | .. | .. |
...2000 rows
df = pd.DataFrame(data=[[181.5,'cm'],
[72.5,'inches'],
[168.0,'cm'],
['NaN','NaN']],
columns = ['height','unit'],
index=[1,2,3,4])
I want to unify the unit
to "cm", and make corresponding changes to height
, and keep the 'NaN's.
CodePudding user response:
Use a dictionary to map conversion factors and use indexing to update the values/units:
# ensure real NaNs:
df = df.replace('NaN', np.nan)
# set up dictionary of conversion factors
d = {'cm': 1, 'inches': 2.54}
# map converted heights
df['height'] = df['height'].mul(df['unit'].map(d))
# update units
df.loc[df['unit'].isin(d), 'unit'] = 'cm'
output:
height unit
1 181.50 cm
2 184.15 cm
3 168.00 cm
4 NaN NaN
handling unknown units
if you want to handle the case of values for which units are unknown and leave them unchanged, use map(lambda x: d.get(x, 1))
instead of map
CodePudding user response:
Adjusted from this solution using a mask:
mask = (df['unit'] == 'inches')
df_inches = df[mask]
df.loc[mask, 'height'] = df_inches['height'] * 2.54
df.loc[mask, 'unit'] = 'cm'
print(df)
Output:
height unit
1 181.5 cm
2 184.15 cm
3 168.0 cm
4 NaN NaN