TLDR: I am looking for a datastructure with the properties of a string buffer that I can access by index in dart.
Long version: I need the properties of Stringbuffer (very efficient at adding characters/bytes) but with the added capability of accessing characters/bytes at specific indices or ranges. I am very surprised that something as simple as that does not seem to exist, neither in the dart core nor in custom libraries.
Fyi I am trying to build a piece table in dart. I do not think that simply calling stringbuffer.toString().substring(start, end)
is an option, as this probably copies the whole string, which is the opposite of the efficiency I want.
CodePudding user response:
I am very surprised that something as simple as that does not seem to exist, neither in the dart core nor in custom libraries.
I would not expect that to be something that many people need to do, so I am not surprised.
Here is a quick implementation to get you started. The basic principle is to keep track of starting (and/or ending) indices of each string component. When doing lookups, do a binary-search on those indices to find the relevant components. lowerBound
from package:collection
is useful for this.
This implementation operates in terms of UTF-16 code units (which is what Dart's [String
] class uses), but it should not be too hard to make it use Runes
/code-points instead if you want.
import 'package:collection/collection.dart';
/// Part of a [IndexableStringBuffer].
class _Part implements Comparable<_Part> {
const _Part(this.startIndexCodeUnits, [this.string = '']);
final String string;
final int startIndexCodeUnits;
int get endIndexCodeUnits => startIndexCodeUnits length;
int get length => string.length;
@override
String toString() => string;
@override
int compareTo(_Part other) =>
endIndexCodeUnits.compareTo(other.endIndexCodeUnits);
}
/// Similar to [StringBuffer] but allows extracting a substring.
class IndexableStringBuffer {
final _parts = <_Part>[];
int _totalLengthCodeUnits = 0;
IndexableStringBuffer();
int get totalLengthCodeUnits => _totalLengthCodeUnits;
/// Adds a new [String].
void add(String string) {
_parts.add(_Part(_totalLengthCodeUnits, string));
_totalLengthCodeUnits = string.length;
}
/// Returns a substring in the range \[start, end), where [start] and
/// [end] are indices in UTF-16 code units.
String substring(int start, int end) {
assert(start >= 0);
assert(start <= totalLengthCodeUnits);
assert(end >= 0);
assert(end <= totalLengthCodeUnits);
// Inclusive start and end indices of the relevant substrings from [_parts].
var startPartIndex = lowerBound<_Part>(_parts, _Part(start));
var endPartIndex = lowerBound<_Part>(_parts, _Part(end));
if (startPartIndex == _parts.length) {
return '';
}
var startPart = _parts[startPartIndex];
var startOffset = start - startPart.startIndexCodeUnits;
assert(startOffset >= 0);
var substring = _parts.getRange(startPartIndex, endPartIndex 1).join();
return substring.substring(startOffset, startOffset end - start);
}
}
// And some quick tests to kick the tires:
void main() {
var buffer = IndexableStringBuffer();
print(buffer.substring(0, 0));
buffer
..add('hello')
..add('world')
..add('goodbye')
..add('friend');
print(buffer.substring(0, 3)); // hel
print(buffer.substring(1, 4)); // ell
print(buffer.substring(1, 5)); // ello
print(buffer.substring(5, 7)); // wo
print(buffer.substring(0, 10)); // helloworld
print(buffer.substring(3, 9)); // loworl
print(buffer.substring(10, 12)); // go
print(buffer.substring(9, 12)); // dgo
print(buffer.substring(0, 23)); // helloworldgoodbyefriend
print(buffer.substring(0, 0)); //
print(buffer.substring(23, 23)); //
}
CodePudding user response:
It seems you are right, my best guess is dart trying to separate StringBuffer to contain building logic only. Is performance critical in your application? You could always use .toString()
.