ReFoCUS: Reinforcement-guided Frame Optimization for Contextual Understanding

Lee, Hosu; Kim, Junho; Kim, Hyunjun; Ro, Yong Man

Abstract

Aligning frame selection with model-internal utility

Recent progress in Large Multi-modal Models has enabled effective vision-language reasoning, yet video understanding remains constrained by suboptimal frame-selection strategies. Prior works rely on static heuristics or external retrieval modules, but these often fail to capture visual cues grounded in the user’s query, conflating raw visual dynamics with true semantic relevance. We introduce ReFoCUS (Reinforcement-guided Frame Optimization for Contextual UnderStanding), the first framework to integrate online policy-gradient reinforcement learning into frame-level optimization for video-LLMs. ReFoCUS learns a frame-selection policy using reward signals derived from reference models, capturing their internal scoring behavior over frame combinations that best support temporally grounded responses. To explore the large combinatorial frame space efficiently, it uses an autoregressive, query-conditional selection architecture that ensures contextual consistency while reducing complexity. The policy needs no explicit frame-level supervision — it implicitly discovers optimal, semantically consistent frame compositions. ReFoCUS consistently improves reasoning accuracy across multiple video-QA benchmarks, demonstrating the advantage of aligning frame selection with model-internal utility.

How it works

“Which frames?” reframed as a reinforcement-learning policy

Instead of optimizing the model’s output, ReFoCUS optimizes its input — learning which frames to feed by training a policy directly against the downstream model’s own confidence.

ReFoCUS framework: the policy model (Vision Encoder + Mamba) autoregressively selects N frames conditioned on the query; a frozen reward model scores each subset by a group-wise prediction margin and updates the policy.

The ReFoCUS loop. A query-conditioned policy π_θ (Vision Encoder + Mamba) selects frames one at a time. A frozen reference model r_φ scores each candidate subset by a group-wise prediction margin, and the resulting advantage updates the policy — no frame-level labels required.

Query-conditioned autoregressive policy

Built on Video-Ma²mba (state-space backbone, linear complexity). Starting from a <sof> token, the policy π_θ picks frames one at a time — each conditioned on the query and previously chosen frames — assembling a coherent, duplicate-free evidence set.

Reward from a frozen reference model

Each candidate subset is scored by a group-relative prediction margin from a frozen r_φ (InternVL3): r = tanh((z_y* − z_ỹ)/2) — how confidently it favors the correct answer over the strongest distractor. Input-level alignment, not output-level.

Frame-level policy optimization

Group-normalized advantages (GRPO) update the policy; an entropy bonus replaces KL to keep exploration alive. A search-space curriculum (4→8→16→32) stabilizes learning over a space as large as C(512, 32) ≈ 7×10⁵⁰.

Results

Plug in ReFoCUS → every model gets better

As a model-agnostic frame selector, ReFoCUS lifts 13 Video-LLMs across 5 benchmarks. Each cell shows base → +ReFoCUS with the ▲ gain. Green is a win — and almost everything is green.

Model	Video-MME (w/o sub)				LongVideoBench	MLVU	Video-MMMU	NExT-QA
Model	short	medium	long	overall	val	m-avg	overall	val (wups)
Closed-source
Gemini 2.5 Flash	77.6→79.31.7	63.7→68.34.6	56.8→60.84.0	66.0→69.53.5	47.9→50.93.0	52.8→58.05.2	40.6→45.65.0	11.7→11.90.2
GPT-4o	68.0→68.20.2	55.0→60.15.1	53.3→54.00.7	58.8→60.82.0	49.5→52.93.4	58.7→65.16.4	62.9→62.10.8	8.5→9.10.6
Open-source
LLaVA-OneVision0.5B	53.7→58.34.6	39.9→44.64.7	37.0→38.31.3	43.5→47.13.6	44.7→48.74.0	44.8→50.35.5	17.3→19.42.1	18.1→18.90.8
InternVL31B	63.1→66.43.3	46.9→51.84.9	39.9→42.62.7	50.0→53.63.6	47.6→50.63.0	54.0→58.94.9	27.7→29.31.6	20.0→20.40.4
VideoLLaMA 32B	55.2→58.93.7	38.8→44.15.3	35.2→38.33.1	43.1→47.14.0	48.8→53.74.9	46.8→50.23.4	28.7→29.20.5	18.9→20.71.8
InternVL32B	71.0→72.21.2	56.4→60.23.8	47.8→49.71.9	58.4→60.72.3	50.9→54.94.0	62.7→68.05.3	38.3→39.31.0	24.4→25.00.6
InternVL3.54B	76.4→78.01.6	60.3→62.32.0	51.3→57.46.1	62.7→65.93.2	57.7→62.64.9	66.6→71.54.9	52.0→53.31.3	22.1→22.90.8
Qwen3-VL4B	74.1→76.72.6	61.0→65.74.7	51.3→57.05.7	62.1→66.44.3	57.4→61.94.5	63.1→71.98.8	54.0→56.42.4	23.8→24.10.3
VideoLLaMA 37B	70.4→72.21.8	57.7→60.12.4	48.9→54.35.4	59.0→62.23.2	54.8→57.02.2	52.9→59.86.9	32.8→34.41.6	25.8→26.50.7
LLaVA-OneVision7B	70.9→72.81.9	55.7→61.76.0	48.8→53.44.6	58.4→62.64.2	55.0→61.06.0	63.7→68.54.8	34.1→35.71.6	16.2→16.40.2
InternVL38B	75.1→75.80.7	64.4→66.82.4	53.4→58.34.9	64.3→67.02.7	57.8→62.04.2	68.1→72.74.6	49.3→50.61.3	26.6→26.80.2
InternVL3.58B	77.4→76.21.2	62.4→64.92.5	53.2→58.95.7	64.4→66.72.3	59.7→64.14.4	67.3→70.63.3	50.0→53.23.2	24.3→24.70.4
Qwen3-VL8B	75.1→79.64.5	64.6→67.02.4	55.3→58.93.6	65.0→68.53.5	56.6→63.36.7	63.0→72.59.5	59.1→61.12.0	25.3→25.70.4

The top line at every frame budget

Against seven competitive frame selectors — Uniform, Frame-Voyager, BOLT, mDP³, TSPO, A.I.R., and K-frames — ReFoCUS is the highest curve at every budget from 4 to 64 frames, on both backbones.

ReFoCUS = top line everywhere Line plots of Video-QA performance versus number of frames (4 to 64) on LLaVA-OV-7B and Qwen2.5-VL-7B. ReFoCUS (orange) is the top line above all seven baselines at every frame budget.

Line plots of Video-QA performance versus number of frames (4 to 64) on LLaVA-OV-7B and Qwen2.5-VL-7B. ReFoCUS (orange) is the top line above all seven baselines at every frame budget.

Fig. 4 — Video-MME vs. frame budget. ReFoCUS (orange) sits above Uniform, Frame-Voyager, BOLT, mDP³, TSPO, A.I.R. and K-frames at every budget, on both LLaVA-OV-7B and Qwen2.5-VL-7B. More frames help everyone — ReFoCUS helps most.

Near-perfect recall Visual Needle-in-a-Haystack heatmaps. Left (Uniform) is mostly red/orange, indicating failure to retrieve the needle. Right (ReFoCUS) is uniformly green, indicating near-perfect recall across needle positions and 64 to 1000 frames.

Visual Needle-in-a-Haystack heatmaps. Left (Uniform) is mostly red/orange, indicating failure to retrieve the needle. Right (ReFoCUS) is uniformly green, indicating near-perfect recall across needle positions and 64 to 1000 frames.

Fig. 7 — Visual Needle-in-a-Haystack. Uniform sampling (left) turns red — it fails to retrieve the needle. ReFoCUS (right) stays green — near-perfect recall across every needle position and 64–1k frames.

Qualitative evidence

It samples the frames the question points to

Four cases where uniform sampling answers wrong, and only ReFoCUS gets it right — because it concentrates its frame budget on the moments that hold the answer.

frame shows the answer evidence · possibly relevant

Video-MME · temporal ordering

In which order are the first four cars driven out of the garage?

(a) yellow · (b) black · (c) silver · (d) white

A. (a)(b)(c)(d). B. (a)(c)(b)(d). C. (b)(c)(d)(a). D. (b)(d)(a)(c).

Uniform C. (b)(c)(d)(a) Wrong

Frame 1 of 32 (Uniform) for case 479-1, at 0:00. — 0:00

Frame 2 of 32 (Uniform) for case 479-1, at 0:21. — 0:21

Frame 3 of 32 (Uniform) for case 479-1, at 0:42. — 0:42

Frame 4 of 32 (Uniform) for case 479-1, at 1:04. — 1:04

Frame 5 of 32 (Uniform) for case 479-1, at 1:25. — 1:25

Frame 6 of 32 (Uniform) for case 479-1, at 1:47. — 1:47

Frame 7 of 32 (Uniform) for case 479-1, at 2:08. — 2:08

Frame 8 of 32 (Uniform) for case 479-1, at 2:30. — 2:30

Frame 9 of 32 (Uniform) for case 479-1, at 2:51. — 2:51

Frame 10 of 32 (Uniform) for case 479-1, at 3:13. — 3:13

Frame 11 of 32 (Uniform) for case 479-1, at 3:34. — 3:34

Frame 12 of 32 (Uniform) for case 479-1, at 3:55. — 3:55

Frame 13 of 32 (Uniform) for case 479-1, at 4:17. — 4:17

Frame 14 of 32 (Uniform) for case 479-1, at 4:38. — 4:38

Frame 15 of 32 (Uniform) for case 479-1, at 5:00. — 5:00

Frame 16 of 32 (Uniform) for case 479-1, at 5:21. — 5:21

Frame 17 of 32 (Uniform) for case 479-1, at 5:43. — 5:43

Frame 18 of 32 (Uniform) for case 479-1, at 6:04. — 6:04

Frame 19 of 32 (Uniform) for case 479-1, at 6:26. — 6:26

Frame 20 of 32 (Uniform) for case 479-1, at 6:47. — 6:47

Frame 21 of 32 (Uniform) for case 479-1, at 7:08. — 7:08

Frame 22 of 32 (Uniform) for case 479-1, at 7:30. — 7:30

Frame 23 of 32 (Uniform) for case 479-1, at 7:51. — 7:51

Frame 24 of 32 (Uniform) for case 479-1, at 8:13. — 8:13

Frame 25 of 32 (Uniform) for case 479-1, at 8:34. — 8:34

Frame 26 of 32 (Uniform) for case 479-1, at 8:56. — 8:56

Frame 27 of 32 (Uniform) for case 479-1, at 9:17. — 9:17

Frame 28 of 32 (Uniform) for case 479-1, at 9:39. — 9:39

Frame 29 of 32 (Uniform) for case 479-1, at 10:00. — 10:00

Frame 30 of 32 (Uniform) for case 479-1, at 10:21. — 10:21

Frame 31 of 32 (Uniform) for case 479-1, at 10:43. — 10:43

Frame 32 of 32 (Uniform) for case 479-1, at 11:04. — 11:04

0:00evenly spaced · 32 frames11:04

ReFoCUS B. (a)(c)(b)(d) Correct

Frame 1 of 32 (ReFoCUS) for case 479-1, at 0:01 — shows the answer evidence. — 0:01

Frame 2 of 32 (ReFoCUS) for case 479-1, at 0:03 — shows the answer evidence. — 0:03

Frame 3 of 32 (ReFoCUS) for case 479-1, at 0:06 — shows the answer evidence. — 0:06

Frame 4 of 32 (ReFoCUS) for case 479-1, at 0:07 — shows the answer evidence. — 0:07

Frame 5 of 32 (ReFoCUS) for case 479-1, at 0:09 — shows the answer evidence. — 0:09

Frame 6 of 32 (ReFoCUS) for case 479-1, at 0:10 — shows the answer evidence. — 0:10

Frame 7 of 32 (ReFoCUS) for case 479-1, at 0:11 — shows the answer evidence. — 0:11

Frame 8 of 32 (ReFoCUS) for case 479-1, at 0:14 — shows the answer evidence. — 0:14

Frame 9 of 32 (ReFoCUS) for case 479-1, at 0:15 — shows the answer evidence. — 0:15

Frame 10 of 32 (ReFoCUS) for case 479-1, at 0:16 — shows the answer evidence. — 0:16

Frame 11 of 32 (ReFoCUS) for case 479-1, at 0:18 — shows the answer evidence. — 0:18

Frame 12 of 32 (ReFoCUS) for case 479-1, at 0:19 — shows the answer evidence. — 0:19

Frame 13 of 32 (ReFoCUS) for case 479-1, at 0:20 — shows the answer evidence. — 0:20

Frame 14 of 32 (ReFoCUS) for case 479-1, at 0:22 — shows the answer evidence. — 0:22

Frame 15 of 32 (ReFoCUS) for case 479-1, at 0:23 — shows the answer evidence. — 0:23

Frame 16 of 32 (ReFoCUS) for case 479-1, at 0:24 — shows the answer evidence. — 0:24

Frame 17 of 32 (ReFoCUS) for case 479-1, at 0:26 — shows the answer evidence. — 0:26

Frame 18 of 32 (ReFoCUS) for case 479-1, at 0:27 — shows the answer evidence. — 0:27

Frame 19 of 32 (ReFoCUS) for case 479-1, at 0:28 — shows the answer evidence. — 0:28

Frame 20 of 32 (ReFoCUS) for case 479-1, at 0:29 — shows the answer evidence. — 0:29

Frame 21 of 32 (ReFoCUS) for case 479-1, at 0:36 — shows the answer evidence. — 0:36

Frame 22 of 32 (ReFoCUS) for case 479-1, at 0:55. — 0:55

Frame 23 of 32 (ReFoCUS) for case 479-1, at 4:43. — 4:43

Frame 24 of 32 (ReFoCUS) for case 479-1, at 5:48. — 5:48

Frame 25 of 32 (ReFoCUS) for case 479-1, at 7:00. — 7:00

Frame 26 of 32 (ReFoCUS) for case 479-1, at 7:13. — 7:13

Frame 27 of 32 (ReFoCUS) for case 479-1, at 7:17. — 7:17

Frame 28 of 32 (ReFoCUS) for case 479-1, at 7:43. — 7:43

Frame 29 of 32 (ReFoCUS) for case 479-1, at 8:13. — 8:13

Frame 30 of 32 (ReFoCUS) for case 479-1, at 8:20. — 8:20

Frame 31 of 32 (ReFoCUS) for case 479-1, at 8:22. — 8:22

Frame 32 of 32 (ReFoCUS) for case 479-1, at 8:23. — 8:23

0:00clusters on the opening garage sequence11:04

The cars all leave in the first 30 seconds. ReFoCUS packs ~20 frames into the opening garage sequence and reads the order yellow → silver → black → white, while uniform sampling spreads evenly across the 11-minute clip and shuffles the sequence.

Video-MME · temporal grounding

What do the expanding red lines on the map in the first few minutes of the video stand for?

A. The Yellow River. B. The Silk Road. C. Du Fu’s route to Xi’an. D. The Yangtze River.

Uniform C. Du Fu’s route Wrong

Frame 1 of 32 (Uniform) for case 618-1, at 0:00. — 0:00

Frame 2 of 32 (Uniform) for case 618-1, at 1:54. — 1:54

Frame 3 of 32 (Uniform) for case 618-1, at 3:48. — 3:48

Frame 4 of 32 (Uniform) for case 618-1, at 5:42. — 5:42

Frame 5 of 32 (Uniform) for case 618-1, at 7:36. — 7:36

Frame 6 of 32 (Uniform) for case 618-1, at 9:30. — 9:30

Frame 7 of 32 (Uniform) for case 618-1, at 11:25. — 11:25

Frame 8 of 32 (Uniform) for case 618-1, at 13:19. — 13:19

Frame 9 of 32 (Uniform) for case 618-1, at 15:13. — 15:13

Frame 10 of 32 (Uniform) for case 618-1, at 17:07. — 17:07

Frame 11 of 32 (Uniform) for case 618-1, at 19:01. — 19:01

Frame 12 of 32 (Uniform) for case 618-1, at 20:56. — 20:56

Frame 13 of 32 (Uniform) for case 618-1, at 22:50. — 22:50

Frame 14 of 32 (Uniform) for case 618-1, at 24:44. — 24:44

Frame 15 of 32 (Uniform) for case 618-1, at 26:38. — 26:38

Frame 16 of 32 (Uniform) for case 618-1, at 28:32. — 28:32

Frame 17 of 32 (Uniform) for case 618-1, at 30:27. — 30:27

Frame 18 of 32 (Uniform) for case 618-1, at 32:21. — 32:21

Frame 19 of 32 (Uniform) for case 618-1, at 34:15. — 34:15

Frame 20 of 32 (Uniform) for case 618-1, at 36:09. — 36:09

Frame 21 of 32 (Uniform) for case 618-1, at 38:03. — 38:03

Frame 22 of 32 (Uniform) for case 618-1, at 39:58. — 39:58

Frame 23 of 32 (Uniform) for case 618-1, at 41:52. — 41:52

Frame 24 of 32 (Uniform) for case 618-1, at 43:46. — 43:46

Frame 25 of 32 (Uniform) for case 618-1, at 45:40. — 45:40

Frame 26 of 32 (Uniform) for case 618-1, at 47:34. — 47:34

Frame 27 of 32 (Uniform) for case 618-1, at 49:29. — 49:29

Frame 28 of 32 (Uniform) for case 618-1, at 51:23. — 51:23

Frame 29 of 32 (Uniform) for case 618-1, at 53:17. — 53:17

Frame 30 of 32 (Uniform) for case 618-1, at 55:11. — 55:11

Frame 31 of 32 (Uniform) for case 618-1, at 57:05. — 57:05

Frame 32 of 32 (Uniform) for case 618-1, at 59:00. — 59:00

0:00evenly spaced · 32 frames59:00

ReFoCUS B. The Silk Road Correct

Frame 1 of 32 (ReFoCUS) for case 618-1, at 0:00. — 0:00

Frame 2 of 32 (ReFoCUS) for case 618-1, at 0:13. — 0:13

Frame 3 of 32 (ReFoCUS) for case 618-1, at 0:20. — 0:20

Frame 4 of 32 (ReFoCUS) for case 618-1, at 0:27. — 0:27

Frame 5 of 32 (ReFoCUS) for case 618-1, at 0:34. — 0:34

Frame 6 of 32 (ReFoCUS) for case 618-1, at 0:41. — 0:41

Frame 7 of 32 (ReFoCUS) for case 618-1, at 0:48. — 0:48

Frame 8 of 32 (ReFoCUS) for case 618-1, at 1:09. — 1:09

Frame 9 of 32 (ReFoCUS) for case 618-1, at 1:16. — 1:16

Frame 10 of 32 (ReFoCUS) for case 618-1, at 1:36. — 1:36

Frame 11 of 32 (ReFoCUS) for case 618-1, at 1:43. — 1:43

Frame 12 of 32 (ReFoCUS) for case 618-1, at 1:50. — 1:50

Frame 13 of 32 (ReFoCUS) for case 618-1, at 1:57. — 1:57

Frame 14 of 32 (ReFoCUS) for case 618-1, at 2:04. — 2:04

Frame 15 of 32 (ReFoCUS) for case 618-1, at 2:11. — 2:11

Frame 16 of 32 (ReFoCUS) for case 618-1, at 2:18. — 2:18

Frame 17 of 32 (ReFoCUS) for case 618-1, at 2:25. — 2:25

Frame 18 of 32 (ReFoCUS) for case 618-1, at 3:00 — shows the answer evidence. — 3:00

Frame 19 of 32 (ReFoCUS) for case 618-1, at 3:07 — shows the answer evidence. — 3:07

Frame 20 of 32 (ReFoCUS) for case 618-1, at 3:13. — 3:13

Frame 21 of 32 (ReFoCUS) for case 618-1, at 3:48. — 3:48

Frame 22 of 32 (ReFoCUS) for case 618-1, at 4:37. — 4:37

Frame 23 of 32 (ReFoCUS) for case 618-1, at 4:44. — 4:44

Frame 24 of 32 (ReFoCUS) for case 618-1, at 4:50. — 4:50

Frame 25 of 32 (ReFoCUS) for case 618-1, at 4:57. — 4:57

Frame 26 of 32 (ReFoCUS) for case 618-1, at 5:04. — 5:04

Frame 27 of 32 (ReFoCUS) for case 618-1, at 5:11. — 5:11

Frame 28 of 32 (ReFoCUS) for case 618-1, at 7:23. — 7:23

Frame 29 of 32 (ReFoCUS) for case 618-1, at 8:11. — 8:11

Frame 30 of 32 (ReFoCUS) for case 618-1, at 10:23. — 10:23

Frame 31 of 32 (ReFoCUS) for case 618-1, at 12:07 — possibly relevant. — 12:07

Frame 32 of 32 (ReFoCUS) for case 618-1, at 31:38 — possibly relevant. — 31:38

0:00concentrated on the early map segment59:00

The evidence is in the first few minutes. ReFoCUS packs its budget into the opening map sequence and reads the expanding red lines as the Silk Road, while uniform sampling spreads frames across the hour-long video and never looks closely.

Video-MME · counting

What is the total number of people in the video?

A. 7. B. 6. C. 5. D. 8.

Uniform B. 6 Wrong

Frame 1 of 32 (Uniform) for case 206-1, at 0:00. — 0:00

Frame 2 of 32 (Uniform) for case 206-1, at 0:02. — 0:02

Frame 3 of 32 (Uniform) for case 206-1, at 0:04. — 0:04

Frame 4 of 32 (Uniform) for case 206-1, at 0:07. — 0:07

Frame 5 of 32 (Uniform) for case 206-1, at 0:09. — 0:09

Frame 6 of 32 (Uniform) for case 206-1, at 0:11. — 0:11

Frame 7 of 32 (Uniform) for case 206-1, at 0:14. — 0:14

Frame 8 of 32 (Uniform) for case 206-1, at 0:16. — 0:16

Frame 9 of 32 (Uniform) for case 206-1, at 0:18. — 0:18

Frame 10 of 32 (Uniform) for case 206-1, at 0:21. — 0:21

Frame 11 of 32 (Uniform) for case 206-1, at 0:23. — 0:23

Frame 12 of 32 (Uniform) for case 206-1, at 0:25. — 0:25

Frame 13 of 32 (Uniform) for case 206-1, at 0:28. — 0:28

Frame 14 of 32 (Uniform) for case 206-1, at 0:30. — 0:30

Frame 15 of 32 (Uniform) for case 206-1, at 0:32. — 0:32

Frame 16 of 32 (Uniform) for case 206-1, at 0:35. — 0:35

Frame 17 of 32 (Uniform) for case 206-1, at 0:37. — 0:37

Frame 18 of 32 (Uniform) for case 206-1, at 0:39. — 0:39

Frame 19 of 32 (Uniform) for case 206-1, at 0:42. — 0:42

Frame 20 of 32 (Uniform) for case 206-1, at 0:44. — 0:44

Frame 21 of 32 (Uniform) for case 206-1, at 0:46. — 0:46

Frame 22 of 32 (Uniform) for case 206-1, at 0:49 — shows the answer evidence. — 0:49

Frame 23 of 32 (Uniform) for case 206-1, at 0:51 — shows the answer evidence. — 0:51

Frame 24 of 32 (Uniform) for case 206-1, at 0:53 — shows the answer evidence. — 0:53

Frame 25 of 32 (Uniform) for case 206-1, at 0:56 — shows the answer evidence. — 0:56

Frame 26 of 32 (Uniform) for case 206-1, at 0:58 — shows the answer evidence. — 0:58

Frame 27 of 32 (Uniform) for case 206-1, at 1:00. — 1:00

Frame 28 of 32 (Uniform) for case 206-1, at 1:03. — 1:03

Frame 29 of 32 (Uniform) for case 206-1, at 1:05. — 1:05

Frame 30 of 32 (Uniform) for case 206-1, at 1:08. — 1:08

Frame 31 of 32 (Uniform) for case 206-1, at 1:10. — 1:10

Frame 32 of 32 (Uniform) for case 206-1, at 1:12. — 1:12

0:00evenly spaced · 32 frames1:12

ReFoCUS A. 7 Correct

Frame 1 of 32 (ReFoCUS) for case 206-1, at 0:22. — 0:22

Frame 2 of 32 (ReFoCUS) for case 206-1, at 0:47 — shows the answer evidence. — 0:47

Frame 3 of 32 (ReFoCUS) for case 206-1, at 0:48 — shows the answer evidence. — 0:48

Frame 4 of 32 (ReFoCUS) for case 206-1, at 0:48 — shows the answer evidence. — 0:48

Frame 5 of 32 (ReFoCUS) for case 206-1, at 0:48 — shows the answer evidence. — 0:48

Frame 6 of 32 (ReFoCUS) for case 206-1, at 0:48 — shows the answer evidence. — 0:48

Frame 7 of 32 (ReFoCUS) for case 206-1, at 0:49 — shows the answer evidence. — 0:49

Frame 8 of 32 (ReFoCUS) for case 206-1, at 0:49 — shows the answer evidence. — 0:49

Frame 9 of 32 (ReFoCUS) for case 206-1, at 0:49 — shows the answer evidence. — 0:49

Frame 10 of 32 (ReFoCUS) for case 206-1, at 0:49 — shows the answer evidence. — 0:49

Frame 11 of 32 (ReFoCUS) for case 206-1, at 0:50 — shows the answer evidence. — 0:50

Frame 12 of 32 (ReFoCUS) for case 206-1, at 0:50 — shows the answer evidence. — 0:50

Frame 13 of 32 (ReFoCUS) for case 206-1, at 0:50 — shows the answer evidence. — 0:50

Frame 14 of 32 (ReFoCUS) for case 206-1, at 0:50 — shows the answer evidence. — 0:50

Frame 15 of 32 (ReFoCUS) for case 206-1, at 0:51 — shows the answer evidence. — 0:51

Frame 16 of 32 (ReFoCUS) for case 206-1, at 0:51 — shows the answer evidence. — 0:51

Frame 17 of 32 (ReFoCUS) for case 206-1, at 0:52 — shows the answer evidence. — 0:52

Frame 18 of 32 (ReFoCUS) for case 206-1, at 0:52 — shows the answer evidence. — 0:52

Frame 19 of 32 (ReFoCUS) for case 206-1, at 0:53 — shows the answer evidence. — 0:53

Frame 20 of 32 (ReFoCUS) for case 206-1, at 0:53 — shows the answer evidence. — 0:53

Frame 21 of 32 (ReFoCUS) for case 206-1, at 0:54 — shows the answer evidence. — 0:54

Frame 22 of 32 (ReFoCUS) for case 206-1, at 0:54 — shows the answer evidence. — 0:54

Frame 23 of 32 (ReFoCUS) for case 206-1, at 0:54 — shows the answer evidence. — 0:54

Frame 24 of 32 (ReFoCUS) for case 206-1, at 0:55 — shows the answer evidence. — 0:55

Frame 25 of 32 (ReFoCUS) for case 206-1, at 0:55 — shows the answer evidence. — 0:55

Frame 26 of 32 (ReFoCUS) for case 206-1, at 0:56 — shows the answer evidence. — 0:56

Frame 27 of 32 (ReFoCUS) for case 206-1, at 0:56 — shows the answer evidence. — 0:56

Frame 28 of 32 (ReFoCUS) for case 206-1, at 0:57 — shows the answer evidence. — 0:57

Frame 29 of 32 (ReFoCUS) for case 206-1, at 0:57 — shows the answer evidence. — 0:57

Frame 30 of 32 (ReFoCUS) for case 206-1, at 0:58 — shows the answer evidence. — 0:58

Frame 31 of 32 (ReFoCUS) for case 206-1, at 0:58 — shows the answer evidence. — 0:58

Frame 32 of 32 (ReFoCUS) for case 206-1, at 0:58 — shows the answer evidence. — 0:58

0:00locks onto the full-group shot (~0:47–0:58)1:12

Counting needs the right shot, not many shots. ReFoCUS zooms into the window where everyone is on screen together and counts seven; the uniformly-spread baseline never catches the full group.

Video-MME · reading on-screen text

The video shows how long it takes to drive from the Earth to the Moon?

A. 160 days. B. 50 days. C. 180 days. D. 19 days.

Uniform B. 50 days Wrong

Frame 1 of 32 (Uniform) for case 345-1, at 0:00. — 0:00

Frame 2 of 32 (Uniform) for case 345-1, at 0:21. — 0:21

Frame 3 of 32 (Uniform) for case 345-1, at 0:43. — 0:43

Frame 4 of 32 (Uniform) for case 345-1, at 1:04. — 1:04

Frame 5 of 32 (Uniform) for case 345-1, at 1:26. — 1:26

Frame 6 of 32 (Uniform) for case 345-1, at 1:47. — 1:47

Frame 7 of 32 (Uniform) for case 345-1, at 2:09. — 2:09

Frame 8 of 32 (Uniform) for case 345-1, at 2:31. — 2:31

Frame 9 of 32 (Uniform) for case 345-1, at 2:52. — 2:52

Frame 10 of 32 (Uniform) for case 345-1, at 3:14. — 3:14

Frame 11 of 32 (Uniform) for case 345-1, at 3:35. — 3:35

Frame 12 of 32 (Uniform) for case 345-1, at 3:57. — 3:57

Frame 13 of 32 (Uniform) for case 345-1, at 4:19. — 4:19

Frame 14 of 32 (Uniform) for case 345-1, at 4:40. — 4:40

Frame 15 of 32 (Uniform) for case 345-1, at 5:02. — 5:02

Frame 16 of 32 (Uniform) for case 345-1, at 5:23. — 5:23

Frame 17 of 32 (Uniform) for case 345-1, at 5:45. — 5:45

Frame 18 of 32 (Uniform) for case 345-1, at 6:06. — 6:06

Frame 19 of 32 (Uniform) for case 345-1, at 6:28. — 6:28

Frame 20 of 32 (Uniform) for case 345-1, at 6:50. — 6:50

Frame 21 of 32 (Uniform) for case 345-1, at 7:11. — 7:11

Frame 22 of 32 (Uniform) for case 345-1, at 7:33. — 7:33

Frame 23 of 32 (Uniform) for case 345-1, at 7:54. — 7:54

Frame 24 of 32 (Uniform) for case 345-1, at 8:16. — 8:16

Frame 25 of 32 (Uniform) for case 345-1, at 8:38. — 8:38

Frame 26 of 32 (Uniform) for case 345-1, at 8:59. — 8:59

Frame 27 of 32 (Uniform) for case 345-1, at 9:21. — 9:21

Frame 28 of 32 (Uniform) for case 345-1, at 9:42. — 9:42

Frame 29 of 32 (Uniform) for case 345-1, at 10:04. — 10:04

Frame 30 of 32 (Uniform) for case 345-1, at 10:25. — 10:25

Frame 31 of 32 (Uniform) for case 345-1, at 10:47. — 10:47

Frame 32 of 32 (Uniform) for case 345-1, at 11:09. — 11:09

0:00evenly spaced · 32 frames11:09

ReFoCUS A. 160 days Correct

Frame 1 of 32 (ReFoCUS) for case 345-1, at 0:36. — 0:36

Frame 2 of 32 (ReFoCUS) for case 345-1, at 0:44. — 0:44

Frame 3 of 32 (ReFoCUS) for case 345-1, at 0:45. — 0:45

Frame 4 of 32 (ReFoCUS) for case 345-1, at 0:47. — 0:47

Frame 5 of 32 (ReFoCUS) for case 345-1, at 0:51. — 0:51

Frame 6 of 32 (ReFoCUS) for case 345-1, at 0:52. — 0:52

Frame 7 of 32 (ReFoCUS) for case 345-1, at 0:53. — 0:53

Frame 8 of 32 (ReFoCUS) for case 345-1, at 0:54 — shows the answer evidence. — 0:54

Frame 9 of 32 (ReFoCUS) for case 345-1, at 0:56 — shows the answer evidence. — 0:56

Frame 10 of 32 (ReFoCUS) for case 345-1, at 1:08. — 1:08

Frame 11 of 32 (ReFoCUS) for case 345-1, at 1:09. — 1:09

Frame 12 of 32 (ReFoCUS) for case 345-1, at 1:18. — 1:18

Frame 13 of 32 (ReFoCUS) for case 345-1, at 1:21. — 1:21

Frame 14 of 32 (ReFoCUS) for case 345-1, at 1:23. — 1:23

Frame 15 of 32 (ReFoCUS) for case 345-1, at 1:34. — 1:34

Frame 16 of 32 (ReFoCUS) for case 345-1, at 1:35. — 1:35

Frame 17 of 32 (ReFoCUS) for case 345-1, at 1:36. — 1:36

Frame 18 of 32 (ReFoCUS) for case 345-1, at 1:44. — 1:44

Frame 19 of 32 (ReFoCUS) for case 345-1, at 1:46. — 1:46

Frame 20 of 32 (ReFoCUS) for case 345-1, at 1:48. — 1:48

Frame 21 of 32 (ReFoCUS) for case 345-1, at 1:49. — 1:49

Frame 22 of 32 (ReFoCUS) for case 345-1, at 2:21. — 2:21

Frame 23 of 32 (ReFoCUS) for case 345-1, at 2:31. — 2:31

Frame 24 of 32 (ReFoCUS) for case 345-1, at 2:33. — 2:33

Frame 25 of 32 (ReFoCUS) for case 345-1, at 2:34. — 2:34

Frame 26 of 32 (ReFoCUS) for case 345-1, at 3:09. — 3:09

Frame 27 of 32 (ReFoCUS) for case 345-1, at 4:48. — 4:48

Frame 28 of 32 (ReFoCUS) for case 345-1, at 4:50. — 4:50

Frame 29 of 32 (ReFoCUS) for case 345-1, at 4:51. — 4:51

Frame 30 of 32 (ReFoCUS) for case 345-1, at 4:53. — 4:53

Frame 31 of 32 (ReFoCUS) for case 345-1, at 4:54. — 4:54

Frame 32 of 32 (ReFoCUS) for case 345-1, at 5:58. — 5:58

0:00focuses where the figure is shown on screen11:09

The answer is a number on the screen. ReFoCUS samples the exact segment where the “160 days” figure appears; uniform sampling skims past it and guesses.

ReFoCUS

One selector. Every backbone. Bigger wins.