Home > Software engineering >  How to apply a command to multiple column elements?
How to apply a command to multiple column elements?

Time:12-04

I have the table below and would like to apply onde command to compare and eliminate duplicate values ​​in row n and n 1 in multiple dataframes (df1, df2).

Comand sugestion: .diff().ne(0)

How to apply this command only to the elements of columns A ,C and D, using the commands def ,lambda or apply?

df1:

A B
22 33
22 4
3 55
1 55

df2:

C D
5 2.3
45 33
7 33
7 11

The expected output is:
df1:

A B
22 33
NaN 4
3 55
1 55

df2:

C D
5 2.3
45 33
7 NaN
NaN 11

The other desired option would be to delete the duplicated lines, keeping the first number.
df1:

A B
22 33
row deleted row deleted
3 55
row deleted row deleted

df2:

C D
5 2.3
45 33
row deleted row deleted
row deleted row deleted

CodePudding user response:

Based on this answer, you can create a mask for a single column in your dataframe (here for example for column A) with

mask1 = df['A'].shift() == df['A']

Since this shows True if there was a duplicate, you need to slice the DataFrame with the negation of the mask

df = df[~mask1]

To do this for multiple columns, make a mask for each column and use NumPy's logical_or to combine the masks. Then slice df with the final mask.

CodePudding user response:

With your suggested command: .diff().ne(0) (or .diff.eq(0))

Option 1: set NaN to duplicate values

# For 1 column
df1.loc[df1['A'].diff().eq(0), 'A'] = np.NaN
print(df1)

      A   B
0  22.0  33
1   NaN   4
2   3.0  55
3   1.0  55

# For multiple columns
df2 = df2.apply(lambda x: x[x.diff().ne(0)])
print(df2)

      C     D
0   5.0   2.3
1  45.0  33.0
2   7.0   NaN
3   NaN  11.0

Option 2: delete rows

>>> df1[df1.diff().ne(0).all(axis=1)]
    A   B
0  22  33
2   3  55

>>> df2[df2.diff().ne(0).all(axis=1)]
    C     D
0   5   2.3
1  45  33.0
  • Related