Selecting names of variables from .txt file using the terminal and regular expressions-CodePudding

I'm trying to make a file with just the names of the variables:

I'm using natural expressions and the bash terminal. The main .txt file has the following content:

" 1. symboling:                -3, -2, -1, 0, 1, 2, 3.

  2. normalized-losses:        continuous from 65 to 256.
  3. make:                     alfa-romero, audi, bmw, chevrolet, dodge, honda,
  4. fuel-type:                diesel, gas.
  5. aspiration:               std, turbo.
  6. num-of-doors:             four, two.
  7. body-style:               hardtop, wagon, sedan, hatchback, convertible.
  8. drive-wheels:             4wd, fwd, rwd.
  9. engine-location:          front, rear.
 10. wheel-base:               continuous from 86.6 120.9.
 11. length:                   continuous from 141.1 to 208.1.
 12. width:                    continuous from 60.3 to 72.3.
 13. height:                   continuous from 47.8 to 59.8.
 14. curb-weight:              continuous from 1488 to 4066.
 15. engine-type:              dohc, dohcv, l, ohc, ohcf, ohcv, rotor.
 16. num-of-cylinders:         eight, five, four, six, three, twelve, two.
 17. engine-size:              continuous from 61 to 326.
 18. fuel-system:              1bbl, 2bbl, 4bbl, idi, mfi, mpfi, spdi, spfi.
 19. bore:                     continuous from 2.54 to 3.94.
 20. stroke:                   continuous from 2.07 to 4.17.
 21. compression-ratio:        continuous from 7 to 23.
 22. horsepower:               continuous from 48 to 288.
 23. peak-rpm:                 continuous from 4150 to 6600.
 24. city-mpg:                 continuous from 13 to 49.
 25. highway-mpg:              continuous from 16 to 54.
 26. price:                    continuous from 5118 to 45400."

I would like a file like:

  "symboling               
   normalized-losses       
   make
   fuel-type
   .
   .
   .
    "

My try:

I know that the regular expression that selects the right info (but with a number) is:

([0-9] \.\s([a-z] -[a-z] -[a-z] ))|([0-9] \.\s[a-z] -[a-z] )|([0-9] \.\s[a-z] )

Then I tried the following command in bash:

egrep "([0-9] \.\s([a-z] -[a-z] -[a-z] ))|([0-9] \.\s[a-z] -[a-z] )|([0-9] \.\s[a-z] )" file.txt  > names_col.txt

But is not working like I would expect. Any suggestions would be great!

CodePudding user response：

Using sed

$ sed '/^$/d;s/ [^[:alpha:]]*\([^:]*\)[^"]*/\1/' input_file
"symboling
normalized-losses
make
fuel-type
aspiration
num-of-doors
body-style
drive-wheels
engine-location
wheel-base
length
width
height
curb-weight
engine-type
num-of-cylinders
engine-size
fuel-system
bore
stroke
compression-ratio
horsepower
peak-rpm
city-mpg
highway-mpg
price"

CodePudding user response：

 sed -En "s/^(.*[0-9].\s)([a-z\-]*)(:.*$)/\2/p" file.txt > names_col.txt