Home > Mobile >  How to remove rows from a dataframe based on keyword found in a particular columns in pyspark
How to remove rows from a dataframe based on keyword found in a particular columns in pyspark

Time:08-03

Suppose I have a dataframe for devices against the various userId:

df=

userId userDisplayName devicename
A12345 Ronaldo L-15672727
B23456 Ibrahimovic DR27365_Android_1/1/2019_5:31 PM
C34567 Messi Messi’s Iphone
D45678 Benncer realmeRMX2001
E56789 Leao XiaomiRedmi Note 8 Pro
F67890 Theo A-android
G67890 Calabria Davide's iphone
H67890 Tonali REALME_TON
I67890 Giroud 12348475androidgiroud

Now I want to remove all the mobile devices from dataframe. That means I want to remove all 'devicename' containing "Android", "iPhone","Realme","Xiaomi","Redmi".

Finally my output should be:

userId userDisplayName devicename
A12345 Ronaldo L-15672727

I have tried the following code: ''' df_output=df.where(~f.lower(col("devicename")).like("%android%") | ~f.lower(col("devicename")).like("%iphone%") | ~f.lower(col("devicename")).like("

  • Related