Home > Back-end >  Regex for printing pattern from string
Regex for printing pattern from string

Time:10-28

i have a file with below content. i need to separate the content into 2 files o/p1 should have content everything within first braces () and ` removed and only 1&2 columns printed. o/p2 should have location with its value

$ cat dt.txt
CREATE EXTERNAL TABLE `rte.fteff_ft`(
  `dt` date,
  `wk_id` int,
  `yq_id` int(10,00),
  `te_ind` string,
  `yw_dt` date,
  `em_dt` date comment dfdsf sdfsdf)
ROW FORMAT DELIMITED
  FIELDS TERMINATED BY '\u0007'
  LINES TERMINATED BY '\n'
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://dfdf/data/ffff/ODE/TdddfT/'
TBLPROPERTIES (
  'last_modified_by'='asdas',
  'last_modified_time'='1639551681',
  'numFiles'='1',
  'totalSize'='2848434',
  'transient_lastDdlTime'='1639551681')

i need output from the above in two files.

o/p1: a.txt
  dt date,
  wk_id int,
  yq_id int(10,00),
  te_ind string,
  yw_dt date,
  em_dt date

o/p2: b.txt

LOCATION
  'hdfs://dfdf/data/ffff/ODE/TdddfT/'

CodePudding user response:

First, use sed to run a couple of commands, to operate on the range of lines between 'CREATE EXTERNAL' and 'ROW DELIMITED FORMAT' where they occur at the start of the line, not including those lines. Then replace grave accent marks with nothing, then keep only the first 2 words.

sed -E '/CREATE EXTERNAL/,/ROW FORMAT DELIMITED/!d;//d;s/`//g; s/(([^ ]  ){2}).*/\1/' dt.txt > a.txt

Second, use the -A option to grep to match the word 'LOCATION' on a line by itself plus the following 1 line.

grep -A 1 '^LOCATION$' dt.txt > b.txt
  • Related