Pretraining Data and Tokenizer for Indic LLM | Synapse