Home > OS >  Drop rows from Pandas DataFrame based on value after decimal
Drop rows from Pandas DataFrame based on value after decimal

Time:09-30

I have big DF with values and names. After interpolating values with increment 0.1 I want to remove the rows with numbers such as 1111.123, and keep only rows with 1111.0, 1111.1, 1111.2 (or 1111.100...)

Value Name
1111.1 Mark
1111.2 Mark
1111.234 Mark
1111.3 Mark
1111.346 Mark
1111.4 Mark

I've tried Series.str.split() by string but it is too slow for CSV bigger than 1 GB. I've tried regex with

m = df['Value'].str.filter(regex='(\d*)\.(\d{3})')

But it returns empty Series.

Another option I've tried is math.modf, but it returns

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Here is the code itself:

import pandas as pd
from math import modf

df = pd.read_csv("file.csv")

split = ((df['Value'].apply(lambda x: modf(x)[0])) * 10).apply(lambda x: modf(x)[0])
target_value = 0 < split < 1
df1 = df.loc[split == target_value]
*some code to remove rows*

The final result I expect is:

Value Name
1111.1 Mark
1111.2 Mark
1111.3 Mark
1111.4 Mark

I am new to Python, but still believe there are more efficient ways to deal with it.

CodePudding user response:

If you do not want to convert your floats to string, use this simple math trick to check whether there are decimals after the first one:

df[(df['Value']*10%1).eq(0)]

Alternatively, you can simple check if the second to last character is a .:

df[df['Value'].astype(str).str[-2].eq('.')]

output:

    Value  Name
0  1111.1  Mark
1  1111.2  Mark
3  1111.3  Mark
5  1111.4  Mark

CodePudding user response:

Try this:

 df["value"] = df["value"].apply(lambda x: x - x % 0.1)

CodePudding user response:

First is necessary read values like strings and then test pattern digits, '.' and one last digit:

df = pd.read_csv("file.csv", dtype={'Value':str})
df = df[df['Value'].str.match('\d \.\d{1}$')]

Another idea without converting to strings, but possible float accuracy problems like 1111.100000000094894:

s = df['Value'] * 10
df = df[s.eq(s.astype(int))]
print (df)
    Value  Name
0  1111.1  Mark
1  1111.2  Mark
3  1111.3  Mark
5  1111.4  Mark

CodePudding user response:

Exclude any row which doesnt have 1 as a single digit between the dot and the end or does not have 0 between the dot and and end.

df[(df['Value'].astype(str).str.contains('(?<=\.)[^0]$|(?<=\.)[1]$'))]



   Value  Name
0  1111.1  Mark
1  1111.2  Mark
3  1111.3  Mark
5  1111.4  Mark

Based on your comment lets try

        Value   Name
0  1111.100000   Mark
1  1111.200000   Mark
2  1111.230000   Mark
3  1111.300000   Mark
4  1111.346000   Mark
5  1111.400000   Mark
6  1111.100099  Added

Output

    Value  Name
0  1111.1  Mark
1  1111.2  Mark
3  1111.3  Mark
5  1111.4  Mark
  • Related