Python: How can I delete the rest duplicate rows while keeping the first and last row based on Colum-CodePudding

How can I delete the rest duplicate rows while keeping the first and last row based on Column A?

 df = pd.DataFrame({
        'Column A': [12,12,12, 15, 16, 141, 141, 141, 141],
         'Column B':['Apple' ,'Apple' ,'Apple' , 'Red', 'Blue', 'Yellow', 'Yellow', 'Yellow', 'Yellow'],
        'Column C':[100, 50, np.nan , 23 , np.nan , 199 , np.nan , 1,np.nan]
    })

or data table as follows:


    | Column A | Column B |Column C 
----| -------- | ---------|--------
0   | 12       | Apple    |100     
1   | 12       | Apple    |50      
2   | 12       | Apple    |NaN      
3   | 15       | Red      |23       
4   | 16       | Blue     |NaN      
5   | 141      | Yellow   |199      
6   | 141      | Yellow   |NaN      
7   | 141      | Yellow   |1        
8   | 141      | Yellow   |NaN

The result would be:


    | Column A | Column B |Column C 
----| -------- | ---------|--------
0   | 12       | Apple    |100         
2   | 12       | Apple    |NaN      
3   | 15       | Red      |23       
4   | 16       | Blue     |NaN      
5   | 141      | Yellow   |199           
8   | 141      | Yellow   |NaN

CodePudding user response：

df.drop_duplicates(subset=['A'], keep='first')

and you do the same thing for last

df.drop_duplicates(subset=['A'], keep='last')

CodePudding user response：

This is a possible way to achieve what you want:

result = (
    pd.concat([
        df.drop_duplicates('Column A', keep='first'),
        df.drop_duplicates('Column A', keep='last'),
    ]).reset_index()
      .drop_duplicates('index')
      .sort_values('index')
      .set_index('index')
      .rename_axis(None)
)

Result:

   Column A Column B  Column C
0        12    Apple     100.0
2        12    Apple       NaN
3        15      Red      23.0
4        16     Blue       NaN
5       141   Yellow     199.0
8       141   Yellow       NaN