Home > Back-end >  format pandas column with strings and numbers
format pandas column with strings and numbers

Time:07-01

I have a pandas dataframe, which consists of numbers and strings. It looks something like this:

Column1   Column2
name1     12,4
name2     13
name3     -
name4     0,2

I want to format Column2 so that all numbers are formated with one digit after the comma. The problem is, that all elements in Column2 are strings.

I tried to convert them to numeric values with pd.to_numeric

dfRW[dfRW.columns[1]] = pd.to_numeric(dfRW[dfRW.columns[1]])

However because of the hyphen "-" it results in an error (which I kind of expected). So I found that I can include errors='ignore', and then it works.

The next step would be to use .format to get the digits, so I tried:

dfRW[dfRW.columns[1]] = pd.to_numeric(dfRW[dfRW.columns[1]], errors='ignore').map('${:,.2f}'.format)

but again the hyphen makes problems.

The expected result would be:

Column1   Column2
name1     12,4
name2     13,0
name3     -
name4     0,2

I have some workaround ideas (like sorting and then only formating the numbers above the rows with hyphens), but I first wanted to check if there is a function that I don't know, which would solve my problem.

CodePudding user response:

You can replace , to ., so possible use to_numeric with errors='coerce' for missing values for - (and another no numbers values) and then use map with if-else for processing NaNs:

f = lambda x: '${:.1f}'.format(x).replace('.', ',') if pd.notna(x) else '-'
dfRW.iloc[:, 1]=pd.to_numeric(dfRW.iloc[:, 1].str.replace(',','.'), errors='coerce').map(f)

print (dfRW)
  Column1 Column2
0   name1   $12,4
1   name2   $13,0
2   name3       -
3   name4    $0,2

If use errors='ignore' and some error no converting, get same ouput like input.

CodePudding user response:

You can try to apply a function to the Column2 values. First, you replace commas with dots to be able to use float() function and format the result with only 2 decimals into a list comprehension:

l = ["0,2", "13", "12,4"]

l = ['{:,.2f}'.format(float(i.replace(",", "."))) for i in l]
print(l)

Output:

['0.20', '13.00', '12.40']

CodePudding user response:

You can add ,0 to those value which doesn't contain , and is not -

m1 = df['Column2'].eq('-')
m2 = df['Column2'].str.contains(',')

df['Column2'] = df['Column2'].mask(~m1 & ~m2, df['Column2'] ',0')
print(df)

  Column1 Column2
0   name1    12,4
1   name2    13,0
2   name3       -
3   name4     0,2
  • Related