Safety Lens: White-Box Behavioral Alignment Detection in Language Models via Persona Vector Extraction | Synapse