Back to Learn

The Math: Data Cleaner

The Data Cleaner uses statistical heuristics to identify low-quality respondents.

1. Speeders

Respondents who complete the survey too quickly. We calculate the median Length of Interview (LOI) and flag anyone below a certain threshold (e.g., 40% of the median).

2. Straight-Liners

Respondents who give the same answer to every question in a grid (e.g., all "5"s). We calculate the standard deviation of their responses for a specific grid; if SD = 0, they are flagged.

3. Gibberish Detection

We check open-ended responses for patterns like repeated characters (e.g., "asdfasdf") or lack of vowels/consonants that indicate non-human or low-effort input.