I have a data frame that I’d like to filter-CodePudding

On the column that I’d like to filter, the column contains data from two different sources. I’d like to normalize this data. We collected some data a certain way and the other rows of data contain data that was collected another way. There are rows that contain 1.2 2.3 3.4 and nothing over 5. I would like to multiply these numbers by 1,000 to match up with the others and remove the comma from the numbers above 1,000.

col1	col2
1	1.641
2	1.548
3	1,807.000
4	1,759.000

CodePudding user response：

It sounds like you want to filter some rows (col2 < 5), apply a transformation (col2 * 1000) then remove something (,).

df.loc[df['col2']<=5,'col2'] = df['col2']*1000

Next would be to remove the comma but if you know all the values in col2 are whole numbers (no decimals) then I think you can just

df['col2'] = int(df['col2'])

But its safer to apply a replace but only if the values are string (if not, df['col2'] = str(df['col2']))

Then you can apply the following:

df['col'2'] = df['col2'].str.replace(',','')

CodePudding user response：

You can create a boolean mask on whether 'col2' contains a comma or not. If it contains a comma, remove the comma. For numbers with no comma, multiply it by 1000:

mask = df['col2'].str.contains(',')
df.loc[mask, 'col2'] = df.loc[mask, 'col2'].str.replace(',','')
df['col2'] = df['col2'].astype(float)
df.loc[~mask, 'col2'] *= 1000

Output:

   col1    col2
0     1  1641.0
1     2  1548.0
2     3  1807.0
3     4  1759.0