Class TrieBuilder

java.lang.Object
com.ibm.icu.impl.TrieBuilder
Direct Known Subclasses:
IntTrieBuilder

public class TrieBuilder extends Object
Builder class to manipulate and generate a trie. This is useful for ICU data in primitive types. Provides a compact way to store information that is indexed by Unicode values, such as character properties, types, keyboard values, etc. This is very useful when you have a block of Unicode data that contains significant values while the rest of the Unicode data is unused in the application or when you have a lot of redundance, such as where all 21,000 Han ideographs have the same value. However, lookup is much faster than a hash table. A trie of any primitive data type serves two purposes:
  • Fast access of the indexed values.
  • Smaller memory footprint.
This is a direct port from the ICU4C version
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static interface 
    Character data in com.ibm.impl.Trie have different user-specified format for different purposes.
  • Field Summary

    Fields
    Modifier and Type
    Field
    Description
    protected static final int
    Length of the BMP portion of the index (stage 1) array.
    static final int
    Number of data values in a stage 2 (data array) block.
    protected static final int
    The alignment size of a stage 2 data block.
    protected static final int
    Shift size for shifting left the index array values.
    protected int
     
    protected int
     
    protected int[]
    Index values at build-time are 32 bits wide for easier processing.
    protected int
     
    protected boolean
     
    protected boolean
     
    protected int[]
    Map of adjusted indexes, used in utrie_compact().
    protected static final int
    Mask for getting the lower bits from the input index.
    private static final int
    Maximum length of the build-time data (stage 2) array.
    protected static final int
    Maximum length of the runtime data (stage 2) array.
    protected static final int
    Length of the index (stage 1) array before folding.
    protected static final int
    If set, then the data (stage 2) array is 32 bits wide.
    protected static final int
    Shifting to position the index value in options
    protected static final int
    If set, then Latin-1 data (for U+0000..U+00ff) is stored in the data (stage 2) array as a simple, linear array at data + DATA_BLOCK_LENGTH.
    protected static final int
    Shift size for shifting right the input index.
    protected static final int
    Number of index (stage 1) entries per lead surrogate.
  • Constructor Summary

    Constructors
    Modifier
    Constructor
    Description
    protected
     
    protected
     
  • Method Summary

    Modifier and Type
    Method
    Description
    protected static final boolean
    equal_int(int[] array, int start1, int start2, int length)
    Compare two sections of an array for equality.
    protected static final int
    findSameIndexBlock(int[] index, int indexLength, int otherBlock)
    Finds the same index block as the otherBlock
    protected void
    Set a value in the trie index map to indicate which data block is referenced and which one is not.
    boolean
    isInZeroBlock(int ch)
    Checks if the character belongs to a zero block in the trie

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Field Details

    • DATA_BLOCK_LENGTH

      public static final int DATA_BLOCK_LENGTH
      Number of data values in a stage 2 (data array) block. 2, 4, 8, .., 0x200
      See Also:
    • m_index_

      protected int[] m_index_
      Index values at build-time are 32 bits wide for easier processing. Bit 31 is set if the data block is used by multiple index values (from setRange()).
    • m_indexLength_

      protected int m_indexLength_
    • m_dataCapacity_

      protected int m_dataCapacity_
    • m_dataLength_

      protected int m_dataLength_
    • m_isLatin1Linear_

      protected boolean m_isLatin1Linear_
    • m_isCompacted_

      protected boolean m_isCompacted_
    • m_map_

      protected int[] m_map_
      Map of adjusted indexes, used in utrie_compact(). Maps from original indexes to new ones.
    • SHIFT_

      protected static final int SHIFT_
      Shift size for shifting right the input index. 1..9
      See Also:
    • MAX_INDEX_LENGTH_

      protected static final int MAX_INDEX_LENGTH_
      Length of the index (stage 1) array before folding. Maximum number of Unicode code points (0x110000) shifted right by SHIFT.
      See Also:
    • BMP_INDEX_LENGTH_

      protected static final int BMP_INDEX_LENGTH_
      Length of the BMP portion of the index (stage 1) array.
      See Also:
    • SURROGATE_BLOCK_COUNT_

      protected static final int SURROGATE_BLOCK_COUNT_
      Number of index (stage 1) entries per lead surrogate. Same as number of indexe entries for 1024 trail surrogates, ==0x400>>UTRIE_SHIFT 10 - SHIFT == Number of bits of a trail surrogate that are used in index table lookups.
      See Also:
    • MASK_

      protected static final int MASK_
      Mask for getting the lower bits from the input index. DATA_BLOCK_LENGTH - 1.
      See Also:
    • INDEX_SHIFT_

      protected static final int INDEX_SHIFT_
      Shift size for shifting left the index array values. Increases possible data size with 16-bit index values at the cost of compactability. This requires blocks of stage 2 data to be aligned by UTRIE_DATA_GRANULARITY. 0..UTRIE_SHIFT
      See Also:
    • MAX_DATA_LENGTH_

      protected static final int MAX_DATA_LENGTH_
      Maximum length of the runtime data (stage 2) array. Limited by 16-bit index values that are left-shifted by INDEX_SHIFT_.
      See Also:
    • OPTIONS_INDEX_SHIFT_

      protected static final int OPTIONS_INDEX_SHIFT_
      Shifting to position the index value in options
      See Also:
    • OPTIONS_DATA_IS_32_BIT_

      protected static final int OPTIONS_DATA_IS_32_BIT_
      If set, then the data (stage 2) array is 32 bits wide.
      See Also:
    • OPTIONS_LATIN1_IS_LINEAR_

      protected static final int OPTIONS_LATIN1_IS_LINEAR_
      If set, then Latin-1 data (for U+0000..U+00ff) is stored in the data (stage 2) array as a simple, linear array at data + DATA_BLOCK_LENGTH.
      See Also:
    • DATA_GRANULARITY_

      protected static final int DATA_GRANULARITY_
      The alignment size of a stage 2 data block. Also the granularity for compaction.
      See Also:
    • MAX_BUILD_TIME_DATA_LENGTH_

      private static final int MAX_BUILD_TIME_DATA_LENGTH_
      Maximum length of the build-time data (stage 2) array. The maximum length is 0x110000 + DATA_BLOCK_LENGTH + 0x400. (Number of Unicode code points + one all-initial-value block + possible duplicate entries for 1024 lead surrogates.)
      See Also:
  • Constructor Details

    • TrieBuilder

      protected TrieBuilder()
    • TrieBuilder

      protected TrieBuilder(TrieBuilder table)
  • Method Details

    • isInZeroBlock

      public boolean isInZeroBlock(int ch)
      Checks if the character belongs to a zero block in the trie
      Parameters:
      ch - codepoint which data is to be retrieved
      Returns:
      true if ch is in the zero block
    • equal_int

      protected static final boolean equal_int(int[] array, int start1, int start2, int length)
      Compare two sections of an array for equality.
    • findUnusedBlocks

      protected void findUnusedBlocks()
      Set a value in the trie index map to indicate which data block is referenced and which one is not. utrie_compact() will remove data blocks that are not used at all. Set - 0 if it is used - -1 if it is not used
    • findSameIndexBlock

      protected static final int findSameIndexBlock(int[] index, int indexLength, int otherBlock)
      Finds the same index block as the otherBlock
      Parameters:
      index - array
      indexLength - size of index
      otherBlock -
      Returns:
      same index block