Home > Software engineering >  Check if character within string; pass if True, do stuff if False
Check if character within string; pass if True, do stuff if False

Time:03-28

I am writing code to process a list of URL's, however some of the URL's have issues and I need to pass them in my for loop. I've tried this:

x_data = []
y_data = []
for item in drop['URL']:
    if re.search("J", str(item)) == True:
        pass
    else:
        print(item)
        var = urllib.request.urlopen(item)
        hdul = ft.open(var)
        data = hdul[0].data
        start = hdul[0].header['WMIN']
        finish = hdul[0].header['WMAX']
        start_log = np.log10(start)
        finish_log = np.log10(finish)
        redshift = hdul[0].header['Z']
        length = len(data[0])

        xaxis = np.linspace(start, finish, length)
        #calculating emitted wavelength from observed and redshift
        x_axis_nr = [xaxis[j]/(redshift 1) for j in range(len(xaxis))]
        gauss_kernel = Gaussian1DKernel(5/3)
        flux = np.convolve(data[0], gauss_kernel)
        wavelength = np.convolve(x_axis_nr, gauss_kernel)
        x_data.append(x_axis_nr)
        y_data.append(data[0])

where drop is a previously defined pandas DataFrame. Previous questions on this topic suggested regex might be the way to go, and I have tried this to filter out any URL containing the letter J (which are only the bad ones).

I get this:

http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0581.fit
http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0582.fit
http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0584.fit
http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0587.fit
http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0589.fit
http://www.gama-survey.org/dr3/data/spectra/sdss/spec-0915-52443-0592.fit
http://www.gama-survey.org/dr3/data/spectra/2qz/J113606.3 001155a.fit

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-8-2a3083a3a6d7> in <module>
     14         finish_log = np.log10(finish)
     15         redshift = hdul[0].header['Z']
---> 16         length = len(data[0])
     17
     18         xaxis = np.linspace(start, finish, length)

TypeError: object of type 'numpy.float32' has no len()

which is the same kind of error I was having before trying to remove J urls, so clearly my regex is not working. I would appreciate some advice on how to filter these, and am happy to provide more information as required.

CodePudding user response:

There's no need to compare the result of re.search with True. From documentation you can see that search returns a match object when a match is found:

Scan through string looking for the first location where the regular expression pattern produces a match, and return a corresponding match object. Return None if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string.

So, when comparing a match object with True the return is False and your else condition is executed.

In [35]: re.search('J', 'http://www.gama-survey.org/dr3/data/spectra/2qz/J113606.3 001155a.fit') == True
Out[35]: False
  • Related