Home > Blockchain >  extracting information from a text file based on a condition in python
extracting information from a text file based on a condition in python

Time:10-15

I have some text files that contain information with the below structure:

[406.7758007117438, 450.3589222165735, 496.23589222165737, 545.9359430604982, 587.2252160650737, 630.8083375699034],0,160p30
[489.35434672089474, 675.9206914082359, 836.4900864260295, 1013.8810371123539, 1188.978139298424, 1384.719877986782],0,360p30
[834.9608540925267, 1013.1164209456024, 1186.6842907981697, 1548.3477376715812, 1910.0111845449924, 2330.5500762582615],0,480p30
[1619.225806451613, 1818.5967741935483, 2554.6774193548385, 2743.435483870968, 3390.8225806451615, 3929.8064516129034],0,720p60
[3697.0806451612902, 4369.4838709677415, 5295.080645161291, 6249.4838709677415, 7689.048387096775, 8188.612903225807],0,1080p60

I need to extract the first 6 values in [] without [] and , if in the line we have 160p30 for all the files. for this, I tried to use this code:

a_list = []  
    tmp=np.zeros(len(psnr_array))
    with open(psnr_path) as f:
        a = f.readlines()
        # if len(a)<120:
        #     return tmp
        pattern = r',160p30'       
        for line in a:
             a_list.append(float(re.search(pattern, line)[1]))

but it produces this error:

Traceback (most recent call last):

  File "e:\ugc\untitled0.py", line 74, in <module>
    tmp=PSNR_bitrate(all_sub_dir_names[i] '\\',current_path)

  File "e:\ugc\untitled0.py", line 55, in PSNR_bitrate
    psnr_array=PSNR_Extraction(fnames_psnr_tmp[i], psnr_array, j)

  File "e:\ugc\untitled0.py", line 24, in PSNR_Extraction
    a_list.append(float(re.search(pattern, line)[1]))

IndexError: no such group

what is the problem? how can I extract this information from the text file with this condition? I need to finally have this:

406.7758007117438 450.3589222165735 496.23589222165737 545.9359430604982 587.2252160650737 630.8083375699034

CodePudding user response:

You can read the lines, filter them, drop the redundant characters, and convert to list with literal_eval:

from ast import literal_eval

with open('file.txt') as f:
    lines = f.read().splitlines()
data = [literal_eval(i.rsplit(',',2)[0]) for i in lines if '160p30' in i]

for line in data: 
    print(' '.join(str(i) for i in line))

prints:

406.7758007117438 450.3589222165735 496.23589222165737 545.9359430604982 587.2252160650737 630.8083375699034
  • Related