remove duplicate values in the next n rows, but keeping first-CodePudding

would really appreciate on the below question, i don't really know where to start trying,

I have a dataframe

pd.DataFrame({'value':[1,1,2,2,1,1,1,1,1,2,1,1]})

I want to write a function that iterates through the values, and remove any duplicates in the next n rows.

For example, if n=5, starting from the first number "1", if there is any "1" in the next 5 rows, it is deleted (marked by "x"). In the next iteration, the second "1" wouldn't be used given it is deleted from the first iteration.

The resulting dataframe would be

pd.DataFrame({'value':[1,'x',2,'x','x','x',1,'x','x',2,'x','x']})

I would want to eventually drop the "x" rows but for the purpose of illustration I've marked it out.

CodePudding user response：

Do you want to actually see the 'x' are they just to demonstrate to us they're to be deleted?

If the latter you could do something like this:

x1 = pd.DataFrame({'value':[1,1,2,2,1,1,1,1,1,2,1,1]})
x1['t'] = x1.index //5
x1.drop_duplicates(subset = ['value', 't']).drop(columns = 't')

    value
0       1
2       2
5       1
9       2
10      1

CodePudding user response：

pd.DataFrame({'value':[1,'x',2,'x','x','x',1,'x','x',2,'x','x']}).drop_duplicates()

here is a link for further information about that function paramaters.