Few-Shot Prompting for STEM: Claude vs Gemini Across 128 Experiments
We ran 128 experiments comparing Claude Sonnet 4 and Gemini 2.0 Flash on STEM problems with zero-shot through 5-shot prompting. The results challenge common assumptions: accuracy stayed flat while format consistency and token efficiency improved dramatically.
January 21, 202615 min read #few-shot prompting#STEM education#prompt engineering