Unsupervised text tokenizer for Neural Network-based text generation
