Home > database >  How to single out the first bit and retaining the last bit of the filename using grep(find)
How to single out the first bit and retaining the last bit of the filename using grep(find)

Time:10-05

Greeting

I am writing a bash code to convert decimal to binary from a file name (Ex: 023-124.grf) and unfortunately, I only need to only convert the last 3 numbers of the file without interfering with the first bit (it looks something like this: 124.grf) I had already tried using cut but it is only ethical with a text file and as for grepping, i am still trying to figure out on using this command since I am still relatively new to bash Is there a way to single out the first bit of the filename ?

CodePudding user response:

Well, I'm not sure you completely specified your problem, but luckily, even a very general variation of it can be solved fairly easily, considering that grep allows you to match both digit and non-digit characters.

So to match "the last 3 consecutive digits that are not succeeded by a digit" in any text (even if it looks like "234_blablabla_lololol_343123_blablabla_abc.ext" or "blabla_987123, rather than "555-123.ext"), you could literally translate the quoted definition to a regular expression, and get "123", by using [0-9] to match a digit and [^0-9] to match a non-digit. The latter serves the purpose of narrowing your digits down to the last ones present in the text, by stating that only non-digits may (optionally) succeed them.

E.g.:

echo 234_blablabla_lololol_343999_blablabla_abc.txt | grep '[0-9][0-9][0-9][^0-9]*$' | grep '^...'

999

Of course, there are many other ways to do this. For instance, grep has a -P flag to enable the most powerful kind of regular expression syntax it supports, namely Perl regex. With this, you can avoid a lot of redundant code.

E.g. with Perl regex, you can shorten repeats of the same regex unit ("atom"):

[0-9][0-9][0-9] -> [0-9]{3}

It even provides shorthands for common concepts as "character classes". One of these is "decimal digit", a shorthand for [0-9], denoted as \d:

[0-9]{3} -> \d{3}

You could also use lookaheads and lookbehinds to fetch your 3 digits in one pass, alleviating the need of grepping for the first 3 characters afterwards (the grep '^...' part), but I can't be bothered to look up the particular syntax for that in grep right now.

Now sadly, I would have to think a lot how to generalize the above definition of "the last 3 consecutive digits that are not succeeded by a digit" into "the last 3 consecutive digits", meaning the above regular expression would not match file names where the last run of 3 digits is succeeded by a digit anywhere later in the file name, such as "blabla_12_blabla_123_blabla_56.ext", but I am optimistic that your naming convention does not allow that.

CodePudding user response:

You can use bash primitives to separate out the desired portion of the name. There's probably a slicker way to get the binary conversion of the decimal number, but I like dc:

$ name=023-124.grf
$ base=${name%.*}
$ echo "$base"
023-124
$ suffix=${base##*-}
$ echo $suffix
124
$ echo "$suffix" 2 o p | dc
1111100
$ new_name="${base%%-*}-$(echo $suffix 2 o p | dc).${name##*.}"
$ echo "$new_name"
023-1111100.grf
  • Related