Home > Mobile >  Transform false in between trues
Transform false in between trues

Time:12-10

Hello I have a dataframe like the following one:

df = pd.DataFrame({"a": [True, True, False, True, True], "b": [True, True, False, False, True]})
df

I would like to be able to transform the False values in between Trues to obtain a result like this (depending on a threshold).

# Threshold = 1
df = pd.DataFrame({"a": [True, True, True, True, True], "b": [True, True, False, False, True]})
df
# Threshold = 2
df = pd.DataFrame({"a": [True, True, True, True, True], "b": [True, True, True, True, True]})
df

Any suggestions to do this apart from a for loop?

CodePudding user response:

If possible simplify solution for replace Falses groups less like Threshold value first filter separate groups by DataFrame.cumsum with DataFrame.mask, counts by Series.map with Series.value_counts and last compare by DataFrame.le with pass to DataFrame.mask:

Threshold = 1

m = df.cumsum().mask(df).apply(lambda x: x.map(x.value_counts())).le(Threshold)

df = df.mask(m, True)

If need not replace start or ends groups by Falses:

df = pd.DataFrame({"a": [False, False, True, False, True, False],
                   "b": [True, True, False, False, True, True]})
print (df)
       a      b
0  False   True
1  False   True
2   True  False
3  False  False
4   True   True
5  False   True

Threshold = 1

df1 = df.cumsum().mask(df)
m1 = df1.apply(lambda x: x.map(x.value_counts())).le(Threshold)
m2 = df1.ne(df1.iloc[0]) & df1.ne(df1.iloc[-1])

df = df.mask(m1 & m2, True)
print (df)
       a      b
0  False   True
1  False   True
2   True  False
3   True  False
4   True   True
5  False   True

CodePudding user response:

from itertools import groupby

def how_many_identical_elements(itter):
    return sum([[x]*x for x in [len(list(v)) for g,v in groupby(itter)]], [])

def fill_up_df(df, th):
    df = df.copy()
    for c in df.columns:
        df[f'{c}_count'] = how_many_identical_elements(df[c].values)
        df[c] = [False if x[0]==False and x[1]>th else True for x in zip(df[c], df[f'{c}_count'])]
    return df[[c for c in df.columns if 'count' not in c]]
  • Related