Home > Net >  Execute script only for first line of file
Execute script only for first line of file

Time:04-11

I have a number of .txt files arbitrarily named A001.txt A002.txt etc.
Files have the following structure:

<sps id="303544" url="https://.xyz.edu/=303544" title="Lawrence Bragg"></sps>

Lawrence Bragg

Sir William Lawrence Bragg, (31 March 1890 – 1 July 1971) was an Australian-born British physicist and X-ray crystallographer, discoverer (1912) of Bragg's law of X-ray diffraction, which is basic for the determination of crystal structure.

He was joint recipient (with his father, William Henry Bragg) of the Nobel Prize in Physics in 1915, "For their services in the analysis of crystal structure by means of X-rays";

I am trying to rename each file based on the value of title attribute. In the example above, I want to rename to Lawrence Bragg.txt
I do:

find . -maxdepth 1 -name '*.txt' -exec ~/scr/rename.sh{} \;

Where rename.sh:

#!/bin/bash

title=$(xmllint --xpath '//sps/@title' "$1" | sed -r 's/[^"] "([^"] ).*/\1/')
mv -v "$1" "$title.txt"

The rename works only if the file has solely the first line, i.e., the file starts and ends with the <sps> tag. If there are additional lines, it does not work—of course.


How do I run this script solely for the first line of each *.txt file? I.e. ignore all the lines after the first one?
I've tried head -1 but can't seem to figure it out.
Or modify sed?

CodePudding user response:

Parsing HTML with sed is easy; parsing HTML with sed in a foolproof way is difficult. That said, I suggest:

sed -n '1{s/.*title="\(.*\)".*/\1/;p;}'

CodePudding user response:

Using sed

#!/usr/bin/env bash

for filename in $(find . -name '*.txt' | sed 's|\./||'); do 
    sed  -n "1s/.*title=\"\([^\"]*\).*/mv '$filename' '\1.txt'/p" < $filename
done

This dry run should give output like

mv 'ABC.txt' 'Lawrence Bragg.txt'

If it looks as expected, then you can execute the command to commit the changes.

#!/usr/bin/env bash

for filename in $(find . -name '*.txt' | sed 's|\./||'); do 
    sed  -n "1s/.*title=\"\([^\"]*\).*/mv '$filename' '\1.txt'/pe" < $filename
done
$ cat Lawrence\ Bragg.txt
<sps id="303544" url="https://.xyz.edu/=303544" title="Lawrence Bragg"></sps>

Lawrence Bragg

Sir William Lawrence Bragg, (31 March 1890 – 1 July 1971) was an Australian-born British physicist and X-ray crystallographer, discoverer (1912) of Bragg's law of X-ray diffraction, which is basic for the determination of crystal structure.

He was joint recipient (with his father, William Henry Bragg) of the Nobel Prize in Physics in 1915, "For their services in the analysis of crystal structure by means of X-rays";
  •  Tags:  
  • bash
  • Related