I want to modify drop_duplicates in a such way: For example, I've got DataFrame with rows:
| A header | Another header |
| -------- | -------------- |
| First | el1 |
| Second | el2 |
| Second | el8 |
| First | el3 |
| Second | el4 |
| Second | el5 |
| First | el6 |
| Second | el9 |
And I need not to drop all duplicates, but only consecutive ones. So as a result a want:
| A header | Another header |
| -------- | -------------- |
| First | el1 |
| Second | el2 |
| First | el3 |
| Second | el4 |
| First | el6 |
| Second | el9 |
Tried to do it with for, but maybe there are better ways
CodePudding user response:
You can simply do it by using shift()
as follows:
import pandas as pd
df = pd.DataFrame({
'A header': ['First', 'Second', 'Second', 'First', 'Second', 'Second', 'First', 'Second'],
'Another header': ['el1', 'el2', 'el8', 'el3', 'el4', 'el5', 'el6', 'el9'],
})
print(df)
"""
A header Another header
0 First el1
1 Second el2
2 Second el8
3 First el3
4 Second el4
5 Second el5
6 First el6
7 Second el9
"""
df2 = df[df['A header'] != df['A header'].shift(1)]
print(df2)
"""
A header Another header
0 First el1
1 Second el2
3 First el3
4 Second el4
6 First el6
7 Second el9
"""
Using shift(1)
, you can compare each row with the row's previous row.
For more information, see https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.shift.html
CodePudding user response:
extract dup:
l=[]
for i in range(len(df1)-1):
if df1['A header'][i]==df1['A header'][i 1] :
l.append(i 1)
drop dup:
df1.drop(l, inplace=True)