would really appreciate on the below question, i don't really know where to start trying,
I have a dataframe
pd.DataFrame({'value':[1,1,2,2,1,1,1,1,1,2,1,1]})
I want to write a function that iterates through the values, and remove any duplicates in the next n rows.
For example, if n=5, starting from the first number "1", if there is any "1" in the next 5 rows, it is deleted (marked by "x"). In the next iteration, the second "1" wouldn't be used given it is deleted from the first iteration.
The resulting dataframe would be
pd.DataFrame({'value':[1,'x',2,'x','x','x',1,'x','x',2,'x','x']})
I would want to eventually drop the "x" rows but for the purpose of illustration I've marked it out.
CodePudding user response:
Do you want to actually see the 'x' are they just to demonstrate to us they're to be deleted?
If the latter you could do something like this:
x1 = pd.DataFrame({'value':[1,1,2,2,1,1,1,1,1,2,1,1]})
x1['t'] = x1.index //5
x1.drop_duplicates(subset = ['value', 't']).drop(columns = 't')
value
0 1
2 2
5 1
9 2
10 1
CodePudding user response:
pd.DataFrame({'value':[1,'x',2,'x','x','x',1,'x','x',2,'x','x']}).drop_duplicates()
here is a link for further information about that function paramaters.