Skip to content

Run

7 emulators 655 tests

Results as of this run. The arrow shows each target's movement since the previous run it was tested in. The suite grew this run, so a downward arrow can be the new tests biting rather than a target regressing.

Suite grew from 625 to 655 tests this run.

That's 30 new tests measured against every target. Movement below compares to the previous run, so a fall here is as likely to be the stricter suite as a real regression.

  1. live (AWS) · full coverage
    100% ground truth
    Tier 1 100%
    Tier 2 100%
    Tier 3 100%
  2. 0.9.13 · full coverage
    96.2% -1.7pp fell 1.7 percentage points
    Tier 1 96.1%
    Tier 2 94.5%
    Tier 3 97.2%
  3. e981a5afd790 · full coverage
    94.8% -1.7pp fell 1.7 percentage points
    Tier 1 95.5%
    Tier 2 92.7%
    Tier 3 94.8%
  4. v0.1.0 · 27 unsupported
    94.7% -2.1pp fell 2.1 percentage points
    Tier 1 94.3%
    Tier 2 95.2%
    Tier 3 95.3%
  5. 2026.5.0 · full coverage
    87.9% -0.4pp fell 0.4 percentage points
    Tier 1 98.5%
    Tier 2 92.7%
    Tier 3 68.7%
  6. d89f8fcc6b1a · full coverage
    87.2% -0.3pp fell 0.3 percentage points
    Tier 1 98.5%
    Tier 2 88.2%
    Tier 3 68.7%
  7. 4.0.0 · 45 unsupported
    84.1% -0.6pp fell 0.6 percentage points
    Tier 1 97.9%
    Tier 2 16.9%
    Tier 3 82.9%
  8. dcce0eaa8bff · full coverage
    65.5% -0.6pp fell 0.6 percentage points
    Tier 1 82.0%
    Tier 2 72.7%
    Tier 3 35.5%