Home > Software design >  filter data in text file and load into postgresql
filter data in text file and load into postgresql

Time:09-29

I have a text file with the below format:

Text: htpps:/xxx
Expiry: ddmm/yyyy
object_id: 00
object: ABC
auth: 333
RequestID: 1234

Text: htpps:/yyy
Expiry: ddmm/yyyy
object_id: 01
object: NNN
auth: 222
RequestID: 3456    

and so on
...
  1. I want to delete all lines with the exception of lines with prefix "Expiry:" "object:" and "object_id:"
  2. then load it into a table in postgresql

Would really appreciate your help on the above two.

thanks Nick

CodePudding user response:

I'm sure there will be other methods, but I found an iterative approach if every object has the same format of

Text: htpps:/xxx
Expiry: ddmm/yyyy
object_id: 00
object: ABC
auth: 333
RequestID: 1234

Then you can transform the above with

more test.txt | awk '{ printf "%s\n", $2 }' | tr  '\n' ',' | sed 's/,,/\n/' | sed '$ s/.$//'

and, for your example it will generate the entries in CSV format

htpps:/xxx,ddmm/yyyy,00,ABC,333,1234
htpps:/yyy,ddmm/yyyy,01,NNN,222,3456

The above code does:

  • awk '{ printf "%s\n", $2 }': prints only the second element for each row
  • tr '\n' ',': transform new lines in ,
  • sed 's/,,/\n/': removes the empty lines
  • sed '$ s/.$//': removes the trailing ,

Of course this is probably an oversimplified example, but you could use it as basis. Once the file is in CSV you can load it with psql

CodePudding user response:

So using another example, here is a sample text file:

store: xxx
Delete: xxx
Expires: Sat, 30 Oct 02021 13:01:57  0100
store: xxx
Delete: xxx
Expires: Sat, 30 Oct 02021 13:01:57  0100
store: abc
store: sdf
Expires: Sat, 30 Oct 02021 13:01:57  0100
  • I want all three fields (store,Delete,Expires) in a csv format.
  • if there is no Delete or Expires line/string, it should show as 'null' or empty space separated with a comma
  • the date field to be trimmed to be only DD Mon YYYY e.g. "30 Oct 2001"

So far with the help of Ftisiot, we have the below but does not work as expected.

Any help would be much appreciated.

cat list.txt | grep -E "Expires|X-Delete-At| Object:" | awk '{ printf "%s\n", $2 }' | tr  '\n' ',' | sed 's/,,/\n/' | sed '$ s/.$//'
  • Related