Tokenizer
Learn about language model tokenization.
Large language models process text using tokens, which are common sequences of characters found in a set of text. These models learn the statistical relationships between tokens and use them to predict the next token in a sequence.
Use the tool below to understand how a piece of text might be tokenized by a language model, and to see the total number of characters and tokens in real time.