Hello I have a dataframe like the following one:
df = pd.DataFrame({"a": [True, True, False, True, True], "b": [True, True, False, False, True]})
df
I would like to be able to transform the False values in between Trues to obtain a result like this (depending on a threshold).
# Threshold = 1
df = pd.DataFrame({"a": [True, True, True, True, True], "b": [True, True, False, False, True]})
df
# Threshold = 2
df = pd.DataFrame({"a": [True, True, True, True, True], "b": [True, True, True, True, True]})
df
Any suggestions to do this apart from a for loop?
CodePudding user response:
If possible simplify solution for replace False
s groups less like Threshold
value first filter separate groups by DataFrame.cumsum
with DataFrame.mask
, counts by Series.map
with Series.value_counts
and last compare by DataFrame.le
with pass to DataFrame.mask
:
Threshold = 1
m = df.cumsum().mask(df).apply(lambda x: x.map(x.value_counts())).le(Threshold)
df = df.mask(m, True)
If need not replace start or ends groups by False
s:
df = pd.DataFrame({"a": [False, False, True, False, True, False],
"b": [True, True, False, False, True, True]})
print (df)
a b
0 False True
1 False True
2 True False
3 False False
4 True True
5 False True
Threshold = 1
df1 = df.cumsum().mask(df)
m1 = df1.apply(lambda x: x.map(x.value_counts())).le(Threshold)
m2 = df1.ne(df1.iloc[0]) & df1.ne(df1.iloc[-1])
df = df.mask(m1 & m2, True)
print (df)
a b
0 False True
1 False True
2 True False
3 True False
4 True True
5 False True
CodePudding user response:
from itertools import groupby
def how_many_identical_elements(itter):
return sum([[x]*x for x in [len(list(v)) for g,v in groupby(itter)]], [])
def fill_up_df(df, th):
df = df.copy()
for c in df.columns:
df[f'{c}_count'] = how_many_identical_elements(df[c].values)
df[c] = [False if x[0]==False and x[1]>th else True for x in zip(df[c], df[f'{c}_count'])]
return df[[c for c in df.columns if 'count' not in c]]