Home > Net >  Search across all columns using a wildcard
Search across all columns using a wildcard

Time:01-05

I have data with a variable number of columns. I want to be able to exclude all rows that contain any number ending "9999".

Sample input:

    Customer Reference,Profile Name,Score,Band Text,Result1,Result2,Result3,Result4,Result5,Result6,Result7,Result8,Result9,Result10,Result11,Result12,Result13,Result14,Result15,Result16,Result17,Result18,Result19,Result20,Result21,Result22,Result23,Result24,Result25,Result26,Result27,Result28,Result29,Result30,Result31,Result32,Result33,Result34,Result35,Result36,Result37,Result38,Result39,Result40,Result41,Result42,Result43,Result44,Result45,Result46,Result47,Result48,Result49
038ff126-1ed5-4a96-bb34-3f4b595228d3,UK,1200,PASS - 2 2,155261,155101,155151,155161,155271,155251,1551001,1551101,1557031,1559500,1558101,395102,3953500,3952260,3952100,3952101,3952111,3953800,3953760,3953512,3953522,3956611,3959600,3959601,3963505,3963513,3963531,3963507,1062,2452,1332,,,,,,,,,,,,,,,,,,
87529660,Germany,1111,APPROVED,2289528,401126,401102,401423,401424,401426,4012100,4012101,4012111,4012260,4013500,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
37a52968-8093-41e5-8a2e-6bd251d0666d,UK,2200,PASS - 2 2,155261,155101,155151,155161,1551001,1551101,1551111,1551121,1551071,1551072,1553520,1556518,1557031,395102,3953500,3952260,3952100,3952101,3952111,3953512,3953521,3956760,3956600,3956601,3956611,3963504,3963508,3963512,3963526,3963529,3963500,3964510,1062,2452,1332,,,,,,,,,,,,,,
138629690,Germany,1111,APPROVED,2283524,2283525,2282111,2282100,2282101,2282263,2282264,2282266,2282260,4012,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1d45f78b-01c5-4007-8f8c-a9fb845cba1f,UK,1300,PASS - 2 2,155261,155101,155151,155161,155131,1551001,1551011,1551021,1553508,1551101,1551111,1551121,1551071,1551072,1553522,3952,3962,1062,2452,1332,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
a56b590b-b8bd-4e56-987e-f801a37e487d,UK,1300,PASS - 2 2,155261,155101,155151,155161,155131,1551001,1551011,1551021,1553508,1559999,1551111,1551121,1551071,1551072,1553522,1556514,3952,3962,1062,2452,1332,,,,,,,,,,,,,,,,,,,,,,,,,,,,
138888892,Germany,2221,APPROVED,2283525,2282111,2282100,2282101,2284530,2284266,2288263,2288264,2288260,401126,401102,401423,401424,401426,4012100,4012101,4012111,4012260,4013500,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Please note that different files have different numbers of "Results" columns.

The only method I've been able to find that works is like this:

filter1 = data.Result1.astype(str).str.contains('9999')
filter2 = data.Result2.astype(str).str.contains('9999')
filter3 = data.Result3.astype(str).str.contains('9999')
filter4 = data.Result4.astype(str).str.contains('9999')

Then applying the filters like this

data.where(filter1 | filter2 | filter3 | filter4 ... etc

Obviously this is painful and I need to change the number of filters per file.

I'm sure there's probably a way to do this but I've not managed to find one. Can anyone help?

CodePudding user response:

I think you can use a combination of replace and dropna.
Partial answer from here:

  1. First you replace all values ending by 9999 by np.nan's
  2. You drop the NaN's

It should look something like that:

df.replace('.9999',np.nan,regex=True).dropna(axis = 0, how = 'any')

CodePudding user response:

The following approach considers only Result-labeled columns and filters out the rows where those columns values end with 9999:

df[~df.filter(like='Result').astype(str).apply(lambda x: x.str[-4:] == '9999').any(1)]

                     Customer Reference Profile Name  Score   Band Text  ...  Result46  Result47  Result48  Result49
0  038ff126-1ed5-4a96-bb34-3f4b595228d3           UK   1200  PASS - 2 2  ...       NaN       NaN       NaN       NaN
1                              87529660      Germany   1111    APPROVED  ...       NaN       NaN       NaN       NaN
2  37a52968-8093-41e5-8a2e-6bd251d0666d           UK   2200  PASS - 2 2  ...       NaN       NaN       NaN       NaN
3                             138629690      Germany   1111    APPROVED  ...       NaN       NaN       NaN       NaN
4  1d45f78b-01c5-4007-8f8c-a9fb845cba1f           UK   1300  PASS - 2 2  ...       NaN       NaN       NaN       NaN
6                             138888892      Germany   2221    APPROVED  ...       NaN       NaN       NaN       NaN
  • Related