Bash: how to print the files that feature a specific given word in any line at a given location?-CodePudding

Is it possible to write a script that takes 2 arguments: word for any word that is to be seacrched in the files of a directory and n for the position at which that word has to be in any line in that file, and finally print only the files that feature that word at that position in any line?

My code until now is:

word=$1
for files in .
do
    grep -rl "$word" 
done

This only prints the files with that word in them, however, and I'm not sure how to implement the rest.

CodePudding user response：

Your loop seems a bit strange, you'd want to loop through all files, not through the set of things you provided (which is a set with a single entry, .). Not really useful at all.

So, for file in * would make much more sense. Then, use that ${file} variable! You're not even doing anything with it in your for loop! That also makes no sense.

For example, you could

use read to get lines from the file (thousands of examples on how to read lines from files using bash)
use cut to select the position
use [[ / ]] to test for the string in that position, and if successful
print the name of the file and skip ahead to the next file.

alternatively, learn your self a bit of regexes. Don't know which version of grep you have, but "from the beginning of the line, find the things that has N repititions the scheme "any repitititon of anything but a delimiter one word delimiter followed by the word I'm looking for" isn't hard.

Something like, to look for "mustard" in the fifth word:

words_before=4
word="mustard"
# idea is to get the sed expression '/^\([^ ]\  \)\{words_before\}word/!{qerror_code}'
sedtemplate_start='/^\([^ ]\  \)'
sedtemplate_end='/!{q100}'
sedtemplate="${sedtemplate_start}\\{${words_before}\\}${word}${sedtemplate_end}"
#.... open all files, go through all lines
  ( echo "${this_line}" | sed -n "${sedtemplate}" ) && echo "${file}"

CodePudding user response：

Supposing that you want to search for a literal word at a given position, the following script generates the corresponding ERE regexp and use it with grep:

#!/bin/sh
grep -Erl "^ *([^ ]   ){$(($2 - 1))}$(sed 's/[][\.|$(){}? *^]/\\&/g' <<< "$1")" .

The ERE regexp has 3 parts:

^ * matches the start of the line followed by any number of space characters
([^ ] ){N} matches "a non-space word followed by more than one space character", N times
sed 's/[][\.|$(){}? *^]/\\&/g' escapes your word literal (this ERE escaping command is taken from here)

You should learn a little about regexps ;-)

CodePudding user response：

If I understand what you're trying to do correctly then the right approach is this (untested):

#!/usr/bin/env bash

awk -v word="$1" -v pos="$2" -v ORS='\0' '$pos == word { print FILENAME; nextfile }' * |
xargs -0 cat

That assumes your awk supports nextfile and printing of \0 (e.g. GNU awk).