How to apply a command to multiple column elements?-CodePudding

I have the table below and would like to apply onde command to compare and eliminate duplicate values in row n and n 1 in multiple dataframes (df1, df2).

Comand sugestion: .diff().ne(0)

How to apply this command only to the elements of columns A ,C and D, using the commands def ,lambda or apply?

df1:

A	B
22	33
22	4
3	55
1	55

df2:

C	D
5	2.3
45	33
7	33
7	11

The expected output is:
df1:

A	B
22	33
NaN	4
3	55
1	55

df2:

C	D
5	2.3
45	33
7	NaN
NaN	11

The other desired option would be to delete the duplicated lines, keeping the first number.
df1:

A	B
22	33
row deleted	row deleted
3	55
row deleted	row deleted

df2:

C	D
5	2.3
45	33
row deleted	row deleted
row deleted	row deleted

CodePudding user response：

Based on this answer, you can create a mask for a single column in your dataframe (here for example for column A) with

mask1 = df['A'].shift() == df['A']

Since this shows True if there was a duplicate, you need to slice the DataFrame with the negation of the mask

df = df[~mask1]

To do this for multiple columns, make a mask for each column and use NumPy's logical_or to combine the masks. Then slice df with the final mask.

CodePudding user response：

With your suggested command: .diff().ne(0) (or .diff.eq(0))

Option 1: set NaN to duplicate values

# For 1 column
df1.loc[df1['A'].diff().eq(0), 'A'] = np.NaN
print(df1)

      A   B
0  22.0  33
1   NaN   4
2   3.0  55
3   1.0  55

# For multiple columns
df2 = df2.apply(lambda x: x[x.diff().ne(0)])
print(df2)

      C     D
0   5.0   2.3
1  45.0  33.0
2   7.0   NaN
3   NaN  11.0

Option 2: delete rows

>>> df1[df1.diff().ne(0).all(axis=1)]
    A   B
0  22  33
2   3  55

>>> df2[df2.diff().ne(0).all(axis=1)]
    C     D
0   5   2.3
1  45  33.0