pandas/numpy selection of values of indices-CodePudding

A Python question. I have a problem. There is a formatted table below (the starts are for more attentions and not really in table):

   Step  Time          Apple_price         fluctuation 
BFGS:    0 18:21:43    -6442.333161        7.4744
BFGS:    1 18:21:43   *-6442.899477        5.8484*
      Step     Time       Apple_price         fluctuation
BFGS:    0 18:21:53    -6441.911200       16.3190
BFGS:    1 18:21:53    -6442.540975       10.6048
BFGS:    2 18:21:53    -6443.107163        7.6685
BFGS:    3 18:21:53    -6443.565044        6.2186
BFGS:    4 18:21:54    *-6443.954663        5.7485*
      Step     Time      Apple_price         fluctuation
BFGS:    0 18:27:00    -6440.611426       24.6802
BFGS:    1 18:27:00    -6441.602767       21.3009
BFGS:    2 18:27:00    -6442.446886       15.6698
BFGS:    3 18:27:01    -6443.084822       11.6312
BFGS:    4 18:27:01    -6443.582671        8.6795
BFGS:    5 18:27:01    -6444.019236        7.4906
BFGS:    6 18:27:01    -6444.389951        6.7435
BFGS:    7 18:27:02   *-6444.732455        6.5221*

I would like to extract the values between "*" as follows:

    -6442.899477        5.8484
    -6443.954663        5.7485
    -6444.732455        6.5221

my code is as follows:

import pandas as pd
import numpy as np


all_lines = []                                   
file_name = input("What's the file name with extension?: ")
with open (f'{file_name}', 'r') as file:                     
    for each_line in file:
        all_lines.append(each_line.strip())
        

#print(all_lines)

for j in all_lines:
    if j == 0:
        j = j   1
        if 'fluctuation' in i:
            all_lines.index(j-1)
print(j)

Unfortunately, the output is only the first line of answer:

-6442.899477 5.8484

Let me know how it can extract values of indices in some lists

CodePudding user response：

Import Regular Expression

import re

Preparing data:

text = """   Step  Time          Apple_price         fluctuation 
BFGS:    0 18:21:43    -6442.333161        7.4744
BFGS:    1 18:21:43   *-6442.899477        5.8484*
      Step     Time       Apple_price         fluctuation
BFGS:    0 18:21:53    -6441.911200       16.3190
BFGS:    1 18:21:53    -6442.540975       10.6048
BFGS:    2 18:21:53    -6443.107163        7.6685
BFGS:    3 18:21:53    -6443.565044        6.2186
BFGS:    4 18:21:54    *-6443.954663        5.7485*
      Step     Time      Apple_price         fluctuation
BFGS:    0 18:27:00    -6440.611426       24.6802
BFGS:    1 18:27:00    -6441.602767       21.3009
BFGS:    2 18:27:00    -6442.446886       15.6698
BFGS:    3 18:27:01    -6443.084822       11.6312
BFGS:    4 18:27:01    -6443.582671        8.6795
BFGS:    5 18:27:01    -6444.019236        7.4906
BFGS:    6 18:27:01    -6444.389951        6.7435
BFGS:    7 18:27:02   *-6444.732455        6.5221*"""

Define regular expression: between * what characters may contain

p = re.compile(r'\*[- 0-9.]*\*')

Match regular expression and text

a = p.findall(text)

a: array of matches. Enumerate retrieves index and content:

for k, v in enumerate(a):
    print(k, v)

Output:

0 -6442.899477 5.8484 1 -6443.954663 5.7485 2 -6444.732455 6.5221

CodePudding user response：

Unfortunately, I cannot explain well. The stars are not in table. I put them only for showing what data I would like to print. Please remove stars and re-help. Bests