Home > database >  remove duplicate values in the next n rows, but keeping first
remove duplicate values in the next n rows, but keeping first

Time:11-17

would really appreciate on the below question, i don't really know where to start trying,

I have a dataframe

pd.DataFrame({'value':[1,1,2,2,1,1,1,1,1,2,1,1]})

I want to write a function that iterates through the values, and remove any duplicates in the next n rows.

For example, if n=5, starting from the first number "1", if there is any "1" in the next 5 rows, it is deleted (marked by "x"). In the next iteration, the second "1" wouldn't be used given it is deleted from the first iteration.

The resulting dataframe would be

pd.DataFrame({'value':[1,'x',2,'x','x','x',1,'x','x',2,'x','x']})

I would want to eventually drop the "x" rows but for the purpose of illustration I've marked it out.

CodePudding user response:

Do you want to actually see the 'x' are they just to demonstrate to us they're to be deleted?

If the latter you could do something like this:

x1 = pd.DataFrame({'value':[1,1,2,2,1,1,1,1,1,2,1,1]})
x1['t'] = x1.index //5
x1.drop_duplicates(subset = ['value', 't']).drop(columns = 't')

    value
0       1
2       2
5       1
9       2
10      1

CodePudding user response:

pd.DataFrame({'value':[1,'x',2,'x','x','x',1,'x','x',2,'x','x']}).drop_duplicates()

here is a link for further information about that function paramaters.

  • Related