I am searching for values within several documents to create different databases for each parameter. "groups["BRICK"]" contains all documents in text format.
a_dict = ['RHO','CE','LAMBDA','THETA_POR','THETA_EFF','THETA_CAP','THETA_80','AW','MEW','KLEFF']
Brick_par = []
for bricks in groups["BRICK"]:
for par in a_dict:
file = open(bricks, 'r', encoding='latin-1')
file_txt = file.read() #leggo il file
regex = '((' (par) ') )\s =\s ([0-9] .?[0-9] )'
searched = re.search(regex, file_txt) #cerco la riga da modificare
Brick_par.append(searched[3])
Brick_par = pd.DataFrame({str(par):Brick_par})
If instead of using the loop I use just a few parameters individually (e.g. CE) the script works. This is because some documents do not contain certain parameters.
I would like to know if there is a way to "ignore" all the values for which regex does not find anything in the document. That way I can probably solve it.
Also, my goal would be to create a single dataframe with all the parameters found. But that's a later step.
The error I get is:
TypeError: 'NoneType' object is not subscriptable
As suggested by diggusbickus:
a_dict = ['RHO','CE','LAMBDA','THETA_POR','THETA_EFF','THETA_CAP','THETA_80','AW','MEW','KLEFF']
Brick_par = []
for bricks in groups["BRICK"]:
for par in a_dict:
file = open(bricks, 'r', encoding='latin-1')
file_txt = file.read() #leggo il file
regex = '((' (par) ') )\s =\s ([0-9] .?[0-9] )'
searched = re.search(regex, file_txt)
if not searched: continue
Brick_par.append(searched[3])
file.close()
Brick_par = pd.DataFrame({str(par):Brick_par})
My goal would be to create a dataframe with all the results for each parameter. Thank you for your availability.
CodePudding user response:
you should make brick_par
a dict in the first place, because that's what you want to give to pandas
import pandas as pd
import re
a_dict = ['RHO','CE','LAMBDA','THETA_POR','THETA_EFF','THETA_CAP',
'THETA_80','AW','MEW','KLEFF']
brick_par = {k: [] for k in a_dict}
for bricks in groups["BRICK"]:
for par in a_dict:
with open(bricks, 'r', encoding='latin-1') as f:
file_txt = f.read() #leggo il file
regex = '((' (par) ') )\s =\s ([0-9] .?[0-9] )'
searched = re.search(regex, file_txt)
if not searched:
brick_par[par].append(None)
else:
brick_par[par].append(searched[3])
brick_par = pd.DataFrame(brick_par)