Home > database >  Dollar symbol should be around entities and values- Perl
Dollar symbol should be around entities and values- Perl

Time:12-07

I have confusion with my code to remove dollars inside the digits (multi values) and to be inserted the dollar symbol around the values.

Sure I am little bit confused.

For e.g.: 10$x$10$x$10$x$10 should be $10x10x10x10$ #might be 'n' numbered infinite.

My code:

use strict;
use warnings;

my $tmp = do { local $/; $_ = <DATA>; };
my @allines = split /\n/, $tmp;
for(@allines)
{
    my $lines = $_;

    my ($pre,$matches,$posts) = "";

    $lines=~s/(\d )(\$*)\\times\$(\d )/$1$2\\times$3\$/g;

    print $lines;
}

Input:

__DATA__
where $Q=k-k^{\prime}$ is the scattering vector of length $4\pi \sin{\theta} /{\lambda}$ for a neutron of wavelength ${\lambda}$ scattered at an angle $2{\theta}$, and k and k' are X-ray absorption spectroscopy. Thus, RMC trials were performed for several samples assuming either A 10$\times$10$\times$10 supercell was first built, based on the unit cell model Sample paragraph testing 10$\times$10$\times$10 text continues.... Sample paragraph testing 10$\times$10$\times$10$\times$10 text continues.... Sample paragraph testing 10$\times$10$\times$10$\times$10$\times$10$\times$10 text continues.... obtained.


Required Output:

where $Q=k-k^{\prime}$ is the scattering vector of length $4\pi \sin{\theta} /{\lambda}$ for a neutron of wavelength ${\lambda}$ scattered at an angle $2{\theta}$, and k and k' are X-ray absorption spectroscopy. Thus, RMC trials were performed for several samples assuming either A $10\times10\times10$ supercell was first built, based on the unit cell model Sample paragraph testing $10\times10\times10$ text continues.... Sample paragraph testing $10\times10\times10\times10$ text continues.... Sample paragraph testing $10\times10\times10\times10\times10\times10$ text continues.... obtained.```

CodePudding user response:

If you simply want to blindly transform 10$x$10$x$10$x$10 into $10x10x10x10$ without taking account anything about the surrounding text, then this should be enough.

$lines=~s/(\d )\$/\$$1/g;

If your requirements are more complex than that, you need to update the question with the details.

[UPDATE]

Just looking again at the input and expected output, I see there is a complication -- some of the input looks like this times$10$ with the expected output times$10. That means we have an optional leading $ that needs to be taken into account.

To deal with that we can add \$? to the start of the regex to match the optional $, like this

$lines=~s/\$?(\d )\$/\$$1/g;

Below is a rewrite of your code that also removes some of the unnecessary splitting

use strict;
use warnings;

while (<DATA>)
{
    s/\$?(\d )\$/\$$1/g;

    print ;
}

__DATA__
Sample paragraph testing 10$\times$10$\times$10 text continues....
Sample paragraph testing 10$\times$10$\times$10$\times$10 text continues....
Sample paragraph testing 10$\times$10$\times$10$\times$10$\times$10$\times$10 text continues....

Output is

Sample paragraph testing $10\times$10\times$10 text continues....
Sample paragraph testing $10\times$10\times$10\times$10 text continues....
Sample paragraph testing $10\times$10\times$10\times$10\times$10\times$10 text continues....

[UPDATE 2]

Assuming the actual requirements are

  1. change the first occurrence of, say, 123$ into $123
  2. for last occurrence of $123, change to 123$
  3. for the intermediate digit-dollar sequences, remove the dollars.
use strict;
use warnings;

while (<DATA>)
{
    # replace the first occurrence only
    s/\$?(\d )\$/\$$1/;

    # remove $ from the all but the last digit-dollar
    # uses lookahead to prevent matching the last digit-dollar
    s/times\$?(\d )\$?(?=\\t)/times$1/g;

    # rework the last occurrence of digit-dollar
    s/times\$(\d )/times$1\$/;

    print ;
}


Input:

__DATA__
Sample paragraph testing 10$\times$10$\times$10 text continues....
Sample paragraph testing 10$\times$10$\times$10$\times$10 text continues....
Sample paragraph testing 10$\times$10$\times$10$\times$10$\times$10$\times$10 text continues....

output is

Sample paragraph testing $10\times10\times10$ text continues....
Sample paragraph testing $10\times10\times10\times10$ text continues....
Sample paragraph testing $10\times10\times10\times10\times10\times10$ text continues....

UPDATE 3

New requirement -- there can be multiple digit-dollar sequences in a single line.

This complicates the code a bit, but not much.

use strict;
use warnings;

while (<DATA>)
{
    # walk the string looking for strings of the form "10$\times$10$\times$10$\times$10"

    while (s/(.*?)((\$?\d \$?\\times) \$?\d \$?)//)
    {
        # output any data that preceded the digit-dollar sequence
        print $1;

        my $block = $2;

        # Remove all dollars
        $block =~ s/\$ //g;

        # put back the initial dollar
        $block =~ s/^(\d )/\$$1/;

        # and the terminating dollar
        $block =~ s/$/\$/;

        # output the modified digit-dollar sequence
        print $block;
    }

    # output trailing text
    print;

}


Input:

__DATA__
where $Q=k-k^{\prime}$ is the scattering vector of length $4\pi \sin{\theta} /{\lambda}$ for a neutron of wavelength ${\lambda}$ scattered at an angle $2{\theta}$, and k and k' are X-ray absorption spectroscopy. Thus, RMC trials were performed for several samples assuming either A 10$\times$10$\times$10 supercell was first built, based on the unit cell model Sample paragraph testing 10$\times$10$\times$10 text continues.... Sample paragraph testing 10$\times$10$\times$10$\times$10 text continues.... Sample paragraph testing 10$\times$10$\times$10$\times$10$\times$10$\times$10 text continues.... obtained.

Sample paragraph testing 10$\times$10$\times$10 text continues....
Sample paragraph testing 10$\times$10$\times$10$\times$10 text continues....
Sample paragraph testing 10$\times$10$\times$10$\times$10$\times$10$\times$10 text continues....

output is

where $Q=k-k^{\prime}$ is the scattering vector of length $4\pi \sin{\theta} /{\lambda}$ for a neutron of wavelength ${\lambda}$ scattered at an angle $2{\theta}$, and k and k' are X-ray absorption spectroscopy. Thus, RMC trials were performed for several samples assuming either A $10\times10\times10$ supercell was first built, based on the unit cell model Sample paragraph testing $10\times10\times10$ text continues.... Sample paragraph testing $10\times10\times10\times10$ text continues.... Sample paragraph testing $10\times10\times10\times10\times10\times10$ text continues.... obtained.

Sample paragraph testing $10\times10\times10$ text continues....
Sample paragraph testing $10\times10\times10\times10$ text continues....
Sample paragraph testing $10\times10\times10\times10\times10\times10$ text continues....

CodePudding user response:

All that breaking the text string into separate lines seems to be complicating things unnecessarily.

use strict;
use warnings;
use feature 'say';

my $text = do { local $/; $_ = <DATA>; };

$text =~ s/\$?(\d )\$/\$$1/g;

say $text;

__DATA__
Sample paragraph testing 10$\times$10$\times$10 text continues....
Sample paragraph testing 10$\times$10$\times$10$\times$10 text continues....
Sample paragraph testing 10$\times$10$\times$10$\times$10$\times$10$\times$10 text continues....

Output:

Sample paragraph testing $10\times$10\times$10 text continues....
Sample paragraph testing $10\times$10\times$10\times$10 text continues....
Sample paragraph testing $10\times$10\times$10\times$10\times$10\times$10 text continues....
  •  Tags:  
  • perl
  • Related