I have a pandas dataframe, which consists of numbers and strings. It looks something like this:
Column1 Column2
name1 12,4
name2 13
name3 -
name4 0,2
I want to format Column2 so that all numbers are formated with one digit after the comma. The problem is, that all elements in Column2 are strings.
I tried to convert them to numeric values with pd.to_numeric
dfRW[dfRW.columns[1]] = pd.to_numeric(dfRW[dfRW.columns[1]])
However because of the hyphen "-" it results in an error (which I kind of expected). So I found that I can include errors='ignore', and then it works.
The next step would be to use .format to get the digits, so I tried:
dfRW[dfRW.columns[1]] = pd.to_numeric(dfRW[dfRW.columns[1]], errors='ignore').map('${:,.2f}'.format)
but again the hyphen makes problems.
The expected result would be:
Column1 Column2
name1 12,4
name2 13,0
name3 -
name4 0,2
I have some workaround ideas (like sorting and then only formating the numbers above the rows with hyphens), but I first wanted to check if there is a function that I don't know, which would solve my problem.
CodePudding user response:
You can replace ,
to .
, so possible use to_numeric
with errors='coerce'
for missing values for -
(and another no numbers values) and then use map
with if-else
for processing NaN
s:
f = lambda x: '${:.1f}'.format(x).replace('.', ',') if pd.notna(x) else '-'
dfRW.iloc[:, 1]=pd.to_numeric(dfRW.iloc[:, 1].str.replace(',','.'), errors='coerce').map(f)
print (dfRW)
Column1 Column2
0 name1 $12,4
1 name2 $13,0
2 name3 -
3 name4 $0,2
If use errors='ignore'
and some error no converting, get same ouput like input.
CodePudding user response:
You can try to apply a function to the Column2
values. First, you replace commas with dots to be able to use float()
function and format the result with only 2 decimals into a list comprehension:
l = ["0,2", "13", "12,4"]
l = ['{:,.2f}'.format(float(i.replace(",", "."))) for i in l]
print(l)
Output:
['0.20', '13.00', '12.40']
CodePudding user response:
You can add ,0
to those value which doesn't contain ,
and is not -
m1 = df['Column2'].eq('-')
m2 = df['Column2'].str.contains(',')
df['Column2'] = df['Column2'].mask(~m1 & ~m2, df['Column2'] ',0')
print(df)
Column1 Column2
0 name1 12,4
1 name2 13,0
2 name3 -
3 name4 0,2