pub struct Unit(_);
Expand description

Unit represents a single unit of haystack for DFA based regex engines.

It is not expected for consumers of this crate to need to use this type unless they are implementing their own DFA. And even then, it’s not required: implementors may use other techniques to handle haystack units.

Typically, a single unit of haystack for a DFA would be a single byte. However, for the DFAs in this crate, matches are delayed by a single byte in order to handle look-ahead assertions (\b, $ and \z). Thus, once we have consumed the haystack, we must run the DFA through one additional transition using a unit that indicates the haystack has ended.

There is no way to represent a sentinel with a u8 since all possible values may be valid haystack units to a DFA, therefore this type explicitly adds room for a sentinel value.

The sentinel EOI value is always its own equivalence class and is ultimately represented by adding 1 to the maximum equivalence class value. So for example, the regex ^[a-z]+$ might be split into the following equivalence classes:

0 => [\x00-`]
1 => [a-z]
2 => [{-\xFF]
3 => [EOI]

Where EOI is the special sentinel value that is always in its own singleton equivalence class.

Implementations§

Create a new haystack unit from a byte value.

All possible byte values are legal. However, when creating a haystack unit for a specific DFA, one should be careful to only construct units that are in that DFA’s alphabet. Namely, one way to compact a DFA’s in-memory representation is to collapse its transitions to a set of equivalence classes into a set of all possible byte values. If a DFA uses equivalence classes instead of byte values, then the byte given here should be the equivalence class.

Create a new “end of input” haystack unit.

The value given is the sentinel value used by this unit to represent the “end of input.” The value should be the total number of equivalence classes in the corresponding alphabet. Its maximum value is 256, which occurs when every byte is its own equivalence class.

Panics

This panics when num_byte_equiv_classes is greater than 256.

If this unit is not an “end of input” sentinel, then returns its underlying byte value. Otherwise return None.

If this unit is an “end of input” sentinel, then return the underlying sentinel value that was given to Unit::eoi. Otherwise return None.

Return this unit as a usize, regardless of whether it is a byte value or an “end of input” sentinel. In the latter case, the underlying sentinel value given to Unit::eoi is returned.

Returns true if and only of this unit is a byte value equivalent to the byte given. This always returns false when this is an “end of input” sentinel.

Returns true when this unit represents an “end of input” sentinel.

Returns true when this unit corresponds to an ASCII word byte.

This always returns false when this unit represents an “end of input” sentinel.

Trait Implementations§

Returns a copy of the value. Read more
Performs copy-assignment from source. Read more
Formats the value using the given formatter. Read more
This method returns an Ordering between self and other. Read more
Compares and returns the maximum of two values. Read more
Compares and returns the minimum of two values. Read more
Restrict a value to a certain interval. Read more
This method tests for self and other values to be equal, and is used by ==. Read more
This method tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason. Read more
This method returns an ordering between self and other values if one exists. Read more
This method tests less than (for self and other) and is used by the < operator. Read more
This method tests less than or equal to (for self and other) and is used by the <= operator. Read more
This method tests greater than (for self and other) and is used by the > operator. Read more
This method tests greater than or equal to (for self and other) and is used by the >= operator. Read more

Auto Trait Implementations§

Blanket Implementations§

Gets the TypeId of self. Read more
Immutably borrows from an owned value. Read more
Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The resulting type after obtaining ownership.
Creates owned data from borrowed data, usually by cloning. Read more
Uses borrowed data to replace owned data, usually by cloning. Read more
The type returned in the event of a conversion error.
Performs the conversion.
The type returned in the event of a conversion error.
Performs the conversion.