
Agent simulation: WebArena-Infinity and virtual testing
The shift from hand-crafted benchmarks to auto-generated simulation environments is collapsing the cost of agent evaluation — and exposing how far even the strongest models still lag behind humans.