Check if character within string; pass if True, do stuff if False-CodePudding

I am writing code to process a list of URL's, however some of the URL's have issues and I need to pass them in my for loop. I've tried this:

x_data = []
y_data = []
for item in drop['URL']:
    if re.search("J", str(item)) == True:
        pass
    else:
        print(item)
        var = urllib.request.urlopen(item)
        hdul = ft.open(var)
        data = hdul[0].data
        start = hdul[0].header['WMIN']
        finish = hdul[0].header['WMAX']
        start_log = np.log10(start)
        finish_log = np.log10(finish)
        redshift = hdul[0].header['Z']
        length = len(data[0])

        xaxis = np.linspace(start, finish, length)
        #calculating emitted wavelength from observed and redshift
        x_axis_nr = [xaxis[j]/(redshift 1) for j in range(len(xaxis))]
        gauss_kernel = Gaussian1DKernel(5/3)
        flux = np.convolve(data[0], gauss_kernel)
        wavelength = np.convolve(x_axis_nr, gauss_kernel)
        x_data.append(x_axis_nr)
        y_data.append(data[0])

where drop is a previously defined pandas DataFrame. Previous questions on this topic suggested regex might be the way to go, and I have tried this to filter out any URL containing the letter J (which are only the bad ones).

I get this:

http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0581.fit
http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0582.fit
http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0584.fit
http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0587.fit
http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0589.fit
http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0592.fit
http://www.gama-survey.org/dr3/data/spectra/2qz/J113606.3 001155a.fit

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-2a3083a3a6d7> in <module>
     14         finish_log = np.log10(finish)
     15         redshift = hdul[0].header['Z']
---> 16         length = len(data[0])
     17
     18         xaxis = np.linspace(start, finish, length)

TypeError: object of type 'numpy.float32' has no len()

which is the same kind of error I was having before trying to remove J urls, so clearly my regex is not working. I would appreciate some advice on how to filter these, and am happy to provide more information as required.

CodePudding user response：

There's no need to compare the result of re.search with True. From documentation you can see that search returns a match object when a match is found:

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding match object. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

So, when comparing a match object with True the return is False and your else condition is executed.

In [35]: re.search('J', 'http://www.gama-survey.org/dr3/data/spectra/2qz/J113606.3 001155a.fit') == True
Out[35]: False