Home > other >  Split() on newline AND space characters?
Split() on newline AND space characters?

Time:11-17

I want to split() a string on both newlines and space characters:

#!/usr/bin/perl
use warnings;
use strict;

my $str = "aa bb cc\ndd ee ff";
my @arr = split(/\s\n/, $str);     # Split on ' ' and '\n'
print join("\n", @arr);            # Print array, one element per line

Output is this:

aa bb cc
dd ee ff

But, what I want is this:

aa
bb
cc
dd
ee
ff

So my code is splitting on the newline (good) but not the spaces. According to perldoc, whitespace should be matched with \s in a character class, and I would have assumed that is whitespace. Am I missing something?

CodePudding user response:

You are splitting on a whitespace character followed by a line feed. To split when either one is encountered, there's

split /[\s\n]/, $str

But \s includes \n, so this can be simplified.

split /\s/, $str

But what if you have two spaces in a row? You could split when a sequence of whitespace is encountered.

split /\s /, $str

There's a special input you can provide which does the same thing except it ignores leading whitespace.

split ' ', $str

So,

use v5.14;
use warnings;

my $str = "aa bb cc\ndd ee ff";
my @arr = split ' ', $str;
say for @arr;

CodePudding user response:

my code is splitting on the newline (good)

Your code is not splitting on newline; it only seems that way due to how you are printing things. Your array contains one element, not two. The element has a newline in the middle of it, and you are simply printing aa bb cc\ndd ee ff.

\s\n means: any whitespace followed by newline, where whitespace actually includes \n.

Change:

my @arr = split(/\s\n/, $str);

to:

my @arr = split(/\s/, $str);

Using Data::Dumper makes it clear that the array now has 6 elements:

use warnings;
use strict;
use Data::Dumper; 

my $str = "aa bb cc\ndd ee ff";
my @arr = split(/\s/, $str);
print Dumper(\@arr);

Prints:

$VAR1 = [
          'aa',
          'bb',
          'cc',
          'dd',
          'ee',
          'ff'
        ];

The above code works on the input string you provided. It is also common to split on multiple consecutive whitespaces using:

my @arr = split(/\s /, $str);

CodePudding user response:

Your question comes from an incorrect analysis of the outcome of your code. You think you have split on newline, when you have not actually split anything at all and are in fact just printing a newline.

If you want to avoid this mistake in the future, and know exactly what your variables contain, you can use the core module Data::Dumper:

use strict;
use warnings;
use Data::Dumper;

my $str = "aa bb cc\ndd ee ff";
my @arr = split(/\s\n/, $str);     # split on whitespace followed by newline
$Data::Dumper::Useqq = 1;          # show exactly what is printed
print Dumper \@arr;                # using Data::Dumper

Output:

$VAR1 = [
          "aa bb cc\ndd ee ff"
        ];

As you would easily be able to tell, you are not printing an array at all, just a single scalar value (inside an array, because you put it there). Data::Dumper is an excellent tool for debugging your data, and a valuable tool for you to learn.

  • Related