I'm reading a file using the "array of lines" mode of Dyalog's ⎕nget
:
lines _ _ ← ⎕nget '/usr/share/dict/words' 1
And it appears to work:
lines[1]
10th
But the individual elements don't appear to be character arrays:
line ← lines[1]
line
10th
≢ line
1
⍴ line
Here we see that the first line has a tally of 1 and a shape of the empty array. I can't index into it any further; lines[1][1]
or line[1]
is a RANK ERROR. If I use ⊂ on the RHS I can assign the value to multiple variables at once and get the same behavior for each variable. But if I do a multiple assignment without the left shoe, I get this:
word rest ← line
word
10th
≢ word
4
⍴ word
4
At last we have the character array I expected! Yet it was not evidently separated from anything else hidden in line
; the other variable is identical:
rest
10th
≢ rest
4
⍴ rest
4
word ≡ rest
1
Significantly, when I look at word
it has no leading space, unlike line
. So it seems that the individual array elements in the content matrix returned by ⎕nget
are further wrapped in something that doesn't show up in shape or tally, and can't be indexed into, but when I use a destructuring assignment it unwraps them. It feels rather like the multiple-values stuff in Common Lisp.
If someone could explain what's going on here, I'd appreciate it. I feel like I'm missing something incredibly basic.
CodePudding user response:
The result of reading a file with "array of lines" mode is a nested array. It is specifically a nested vector of character vectors where each character vector is a line from your text file.
For example, take \tmp\test.txt
here:
my text file
has 3
lines
If we read this in, we can inspect the contents
(content newline encoding) ← ⎕nget'\tmp\test.txt' 1
≢ content ⍝ How many lines?
3
≢¨content ⍝ How long is each line?
12 5 5
content[2] ⍝ Indexing returns a scalar (non-simple)
┌─────┐
│has 3│
└─────┘
2⊃content ⍝ Use pick to get the contents of the 2nd scalar
has 3
⊃content[2] ⍝ Disclose the non-simple scalar
has 3
As you probably read from the online documentation, the default behaviour of ⎕NGET
is to bring in a simple (non-nested) character vector with embedded new line characters. These are typically operating-system dependent.
(content encoding newline) ← ⎕nget'\tmp\test.txt'
newline ⍝ Unicode code points for line endings in this file (Microsoft Windows)
13 10
content
my text file
has 3
lines
content ∊ ⎕ucs 10 13
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1
But with "array of lines" mode, you get a nested result.
For a quick introduction to nested arrays and the array model, see Stefan Kruger's LearnAPL book.
CodePudding user response:
If you turn boxing on it's easier to see what's happening. Each element is an enclosed character vector. Use pick ⊃
instead of bracket index []
to get the actual item.
words ← ⊃⎕nget'/usr/share/dict/words'1
]box on -s=max
⍴words
┌→─────┐
│235886│
└~─────┘
words[10]
┌─────────┐
│ ┌→────┐ │
│ │Aaron│ │
│ └─────┘ │
└∊────────┘
10⊃words ⍝ use pick
┌→────┐
│Aaron│
└─────┘