Run
7 emulators 655 tests
Results as of this run. The arrow shows each target's movement since the previous run it was tested in. The suite grew this run, so a downward arrow can be the new tests biting rather than a target regressing.
Suite grew from 625 to 655 tests this run.
That's 30 new tests measured against every target. Movement below compares to the previous run, so a fall here is as likely to be the stricter suite as a real regression.
-
live (AWS) · full coverage100% ground truthTier 1 100%Tier 2 100%Tier 3 100%
-
0.9.13 · full coverage96.2% -1.7pp fell 1.7 percentage pointsTier 1 96.1%Tier 2 94.5%Tier 3 97.2%
-
e981a5afd790 · full coverage94.8% -1.7pp fell 1.7 percentage pointsTier 1 95.5%Tier 2 92.7%Tier 3 94.8%
-
v0.1.0 · 27 unsupported94.7% -2.1pp fell 2.1 percentage pointsTier 1 94.3%Tier 2 95.2%Tier 3 95.3%
-
2026.5.0 · full coverage87.9% -0.4pp fell 0.4 percentage pointsTier 1 98.5%Tier 2 92.7%Tier 3 68.7%
-
d89f8fcc6b1a · full coverage87.2% -0.3pp fell 0.3 percentage pointsTier 1 98.5%Tier 2 88.2%Tier 3 68.7%
-
4.0.0 · 45 unsupported84.1% -0.6pp fell 0.6 percentage pointsTier 1 97.9%Tier 2 16.9%Tier 3 82.9%
-
dcce0eaa8bff · full coverage65.5% -0.6pp fell 0.6 percentage pointsTier 1 82.0%Tier 2 72.7%Tier 3 35.5%