Class TextFormat.Tokenizer

java.lang.Object
com.google.protobuf.TextFormat.Tokenizer
Enclosing class:
TextFormat

private static final class TextFormat.Tokenizer extends Object
Represents a stream of tokens parsed from a String.

The Java standard library provides many classes that you might think would be useful for implementing this, but aren't. For example:

  • java.io.StreamTokenizer: This almost does what we want -- or, at least, something that would get us close to what we want -- except for one fatal flaw: It automatically un-escapes strings using Java escape sequences, which do not include all the escape sequences we need to support (e.g. '\x').
  • java.util.Scanner: This seems like a great way at least to parse regular expressions out of a stream (so we wouldn't have to load the entire input into a single string before parsing). Sadly, Scanner requires that tokens be delimited with some delimiter. Thus, although the text "foo:" should parse to two tokens ("foo" and ":"), Scanner would recognize it only as a single token. Furthermore, Scanner provides no way to inspect the contents of delimiters, making it impossible to keep track of line and column numbers.
  • Field Details

    • text

      private final CharSequence text
    • currentToken

      private String currentToken
    • pos

      private int pos
    • line

      private int line
    • column

      private int column
    • lineInfoTrackingPos

      private int lineInfoTrackingPos
    • previousLine

      private int previousLine
    • previousColumn

      private int previousColumn
    • containsSilentMarkerAfterCurrentToken

      private boolean containsSilentMarkerAfterCurrentToken
      containsSilentMarkerAfterCurrentToken indicates if there is a silent marker after the current token. This value is moved to containsSilentMarkerAfterPrevToken every time the next token is parsed.
    • containsSilentMarkerAfterPrevToken

      private boolean containsSilentMarkerAfterPrevToken
  • Constructor Details

    • Tokenizer

      private Tokenizer(CharSequence text)
      Construct a tokenizer that parses tokens from the given text.
  • Method Details

    • getPreviousLine

      int getPreviousLine()
    • getPreviousColumn

      int getPreviousColumn()
    • getLine

      int getLine()
    • getColumn

      int getColumn()
    • getContainsSilentMarkerAfterCurrentToken

      boolean getContainsSilentMarkerAfterCurrentToken()
    • getContainsSilentMarkerAfterPrevToken

      boolean getContainsSilentMarkerAfterPrevToken()
    • atEnd

      boolean atEnd()
      Are we at the end of the input?
    • nextToken

      void nextToken()
      Advance to the next token.
    • nextTokenInternal

      private String nextTokenInternal()
    • isAlphaUnder

      private static boolean isAlphaUnder(char c)
    • isDigitPlusMinus

      private static boolean isDigitPlusMinus(char c)
    • isWhitespace

      private static boolean isWhitespace(char c)
    • nextTokenSingleChar

      private String nextTokenSingleChar()
      Produce a token for the single char at the current position.

      We hardcode the expected single-char tokens to avoid allocating a unique string every time, which is a GC risk. String-literals are always loaded from the class constant pool.

      This method must not be called if the current position is after the end-of-text.

    • skipWhitespace

      private void skipWhitespace()
      Skip over any whitespace so that the matcher region starts at the next token.
    • tryConsume

      boolean tryConsume(String token)
      If the next token exactly matches token, consume it and return true. Otherwise, return false without doing anything.
    • consume

      void consume(String token) throws TextFormat.ParseException
      If the next token exactly matches token, consume it. Otherwise, throw a TextFormat.ParseException.
      Throws:
      TextFormat.ParseException
    • lookingAtInteger

      boolean lookingAtInteger()
      Returns true if the next token is an integer, but does not consume it.
    • lookingAt

      boolean lookingAt(String text)
      Returns true if the current token's text is equal to that specified.
    • consumeIdentifier

      String consumeIdentifier() throws TextFormat.ParseException
      If the next token is an identifier, consume it and return its value. Otherwise, throw a TextFormat.ParseException.
      Throws:
      TextFormat.ParseException
    • tryConsumeIdentifier

      boolean tryConsumeIdentifier()
      If the next token is an identifier, consume it and return true. Otherwise, return false without doing anything.
    • consumeInt32

      int consumeInt32() throws TextFormat.ParseException
      If the next token is a 32-bit signed integer, consume it and return its value. Otherwise, throw a TextFormat.ParseException.
      Throws:
      TextFormat.ParseException
    • consumeUInt32

      int consumeUInt32() throws TextFormat.ParseException
      If the next token is a 32-bit unsigned integer, consume it and return its value. Otherwise, throw a TextFormat.ParseException.
      Throws:
      TextFormat.ParseException
    • consumeInt64

      long consumeInt64() throws TextFormat.ParseException
      If the next token is a 64-bit signed integer, consume it and return its value. Otherwise, throw a TextFormat.ParseException.
      Throws:
      TextFormat.ParseException
    • tryConsumeInt64

      boolean tryConsumeInt64()
      If the next token is a 64-bit signed integer, consume it and return true. Otherwise, return false without doing anything.
    • consumeUInt64

      long consumeUInt64() throws TextFormat.ParseException
      If the next token is a 64-bit unsigned integer, consume it and return its value. Otherwise, throw a TextFormat.ParseException.
      Throws:
      TextFormat.ParseException
    • tryConsumeUInt64

      public boolean tryConsumeUInt64()
      If the next token is a 64-bit unsigned integer, consume it and return true. Otherwise, return false without doing anything.
    • consumeDouble

      public double consumeDouble() throws TextFormat.ParseException
      If the next token is a double, consume it and return its value. Otherwise, throw a TextFormat.ParseException.
      Throws:
      TextFormat.ParseException
    • tryConsumeDouble

      public boolean tryConsumeDouble()
      If the next token is a double, consume it and return true. Otherwise, return false without doing anything.
    • consumeFloat

      public float consumeFloat() throws TextFormat.ParseException
      If the next token is a float, consume it and return its value. Otherwise, throw a TextFormat.ParseException.
      Throws:
      TextFormat.ParseException
    • tryConsumeFloat

      public boolean tryConsumeFloat()
      If the next token is a float, consume it and return true. Otherwise, return false without doing anything.
    • consumeBoolean

      public boolean consumeBoolean() throws TextFormat.ParseException
      If the next token is a boolean, consume it and return its value. Otherwise, throw a TextFormat.ParseException.
      Throws:
      TextFormat.ParseException
    • consumeString

      public String consumeString() throws TextFormat.ParseException
      If the next token is a string, consume it and return its (unescaped) value. Otherwise, throw a TextFormat.ParseException.
      Throws:
      TextFormat.ParseException
    • consumeByteString

      If the next token is a string, consume it, unescape it as a ByteString, and return it. Otherwise, throw a TextFormat.ParseException.
      Throws:
      TextFormat.ParseException
    • tryConsumeByteString

      boolean tryConsumeByteString()
      If the next token is a string, consume it and return true. Otherwise, return false.
    • consumeByteString

      private void consumeByteString(List<ByteString> list) throws TextFormat.ParseException
      Like consumeByteString() but adds each token of the string to the given list. String literals (whether bytes or text) may come in multiple adjacent tokens which are automatically concatenated, like in C or Python.
      Throws:
      TextFormat.ParseException
    • parseException

      TextFormat.ParseException parseException(String description)
      Returns a TextFormat.ParseException with the current line and column numbers in the description, suitable for throwing.
    • parseExceptionPreviousToken

      TextFormat.ParseException parseExceptionPreviousToken(String description)
      Returns a TextFormat.ParseException with the line and column numbers of the previous token in the description, suitable for throwing.
    • integerParseException

      private TextFormat.ParseException integerParseException(NumberFormatException e)
      Constructs an appropriate TextFormat.ParseException for the given NumberFormatException when trying to parse an integer.
    • floatParseException

      private TextFormat.ParseException floatParseException(NumberFormatException e)
      Constructs an appropriate TextFormat.ParseException for the given NumberFormatException when trying to parse a float or double.