Across generations, sizes, and types, large language models poorly report self-confidence in gastroenterology clinical reasoning tasks | Synapse