Skip to content

Quality Assurance: Finding Frequently Repeated Errors

Identify potential bugs by finding which errors repeat most often in test runs or production logs. Frequent repetition often indicates underlying issues.

The Problem

When analyzing test failures or production errors:

  • Hard to spot patterns - Same errors buried in output
  • Can't prioritize fixes - Don't know which errors are most common
  • Miss systemic issues - Repeated errors indicate root cause problems
  • Waste QA time - Investigating same error multiple times

Input Data

test-failures.log
Test run 1: ERROR: test_database_connection failed: Connection refused
Test run 2: ERROR: test_api_timeout failed: Request timeout after 30s
Test run 3: ERROR: test_database_connection failed: Connection refused
Test run 4: ERROR: test_user_authentication failed: Invalid credentials
Test run 5: ERROR: test_database_connection failed: Connection refused
Test run 6: ERROR: test_api_timeout failed: Request timeout after 30s
Test run 7: ERROR: test_user_authentication failed: Invalid credentials
Test run 8: ERROR: test_database_connection failed: Connection refused
Test run 9: ERROR: test_api_timeout failed: Request timeout after 30s
Test run 10: ERROR: test_database_connection failed: Connection refused

Test failure log with 10 failures across 3 different tests:

  • test_database_connection (lines 1, 3, 5, 8, 10) - fails 5 times
  • test_api_timeout (lines 2, 6, 9) - fails 3 times
  • test_user_authentication (lines 4, 7) - fails 2 times

Output Data

expected-annotated.log
Test run 1: ERROR: test_database_connection failed: Connection refused
Test run 2: ERROR: test_api_timeout failed: Request timeout after 30s
[DUPLICATE: Lines 3-3 matched lines 1-1 (sequence seen 2 times)]
Test run 4: ERROR: test_user_authentication failed: Invalid credentials
[DUPLICATE: Lines 5-6 matched lines 1-2 (sequence seen 2 times)]
[DUPLICATE: Lines 7-9 matched lines 3-5 (sequence seen 2 times)]
Test run 10: ERROR: test_database_connection failed: Connection refused

Annotations show repeat counts: - Line 5: "seen 2 times" → test_database_connection failed 3× total - Line 8: "seen 3 times" → test_database_connection failed 4× total - Line 10: Last occurrence (5× total)

Solution

uniqseq test-failures.log \ --window-size 1 \ --skip-chars 13 \ --annotate \ --quiet > annotated.log

Options:

  • --window-size 1: Deduplicate individual test failures
  • --skip-chars 13: Skip "Test run X: " prefix
  • --annotate: Add markers showing duplicate counts

from uniqseq import UniqSeq

uniqseq = UniqSeq(
    window_size=1,    # (1)!
    skip_chars=13,    # (2)!
    annotate=True,    # (3)!
)

with open("test-failures.log") as f:
    with open("output.log", "w") as out:
        for line in f:
            uniqseq.process_line(line.rstrip("\n"), out)
        uniqseq.flush_to_stream(out)

  1. Deduplicate individual lines
  2. Skip "Test run X: " prefix when comparing
  3. Add annotation markers for duplicates

How It Works

Annotations show where duplicates were skipped and how many times each pattern has been seen:

Test run 1: ERROR: test_database_connection failed...
Test run 3: ERROR: test_database_connection failed...  ← removed
  [DUPLICATE: Lines 3-3 matched lines 1-1 (sequence seen 1 times)]

Test run 5: ERROR: test_database_connection failed...  ← removed
  [DUPLICATE: Lines 5-5 matched lines 1-1 (sequence seen 2 times)]

The "seen N times" count increments each time the pattern repeats, helping identify the most frequent failures.

Analyzing Repeat Frequency

Extract Annotation Counts

Find the highest repeat counts:

uniqseq test-failures.log --skip-chars 13 --annotate --quiet | \
    grep "DUPLICATE" | \
    grep -oE "seen [0-9]+ times" | \
    awk '{print $2}' | \
    sort -rn | \
    head -1

Output: 3 (meaning one error appeared 4 times total: original + seen 3 times)

Rank Errors by Frequency

Combine with custom annotation format to extract failure names and counts:

# Extract duplicates and their counts
uniqseq test-failures.log --skip-chars 13 --annotate \
    --annotation-format 'SKIP|{count}' --quiet | \
    grep 'SKIP|' | \
    awk -F'|' '{print $2}' | \
    sort | \
    uniq -c | \
    sort -rn

Output:

3 3    # test_database_connection appeared 4 times (3 duplicates)
2 2    # test_api_timeout appeared 3 times (2 duplicates)
1 1    # test_user_authentication appeared 2 times (1 duplicate)

Find Most Critical Bugs

Show errors ordered by frequency:

#!/bin/bash
# Extract unique errors with their total occurrence count

uniqseq test-failures.log --skip-chars 13 --annotate --quiet | \
    grep -E "ERROR:|DUPLICATE" | \
    while read line; do
        if [[ $line == *"ERROR:"* ]]; then
            ERROR=$line
        elif [[ $line == *"seen"* ]]; then
            COUNT=$(echo $line | grep -oE "seen [0-9]+" | awk '{print $2+1}')
            echo "$COUNT|$ERROR"
        fi
    done | \
    sort -rn -t'|' -k1 | \
    head -5

Shows top 5 most frequent errors with their counts.

Real-World Workflows

CI/CD Integration

Identify flaky tests in CI:

#!/bin/bash
# Run in CI pipeline after tests

uniqseq test-output.log --skip-chars 20 --annotate --quiet > annotated.log

# Count failures that appeared more than 3 times
FLAKY=$(grep "DUPLICATE" annotated.log | \
    grep -E "seen [3-9]|seen [0-9]{2}" | wc -l)

if [ $FLAKY -gt 0 ]; then
    echo "WARNING: $FLAKY flaky tests detected"
    grep "DUPLICATE" annotated.log | grep -E "seen [3-9]"
    exit 1
fi

Production Error Triage

Prioritize error investigation:

# Analyze production errors from last hour
uniqseq /var/log/app-errors.log \
    --skip-chars 20 \
    --annotate \
    --annotation-format 'REPEAT:{count}' | \
    grep 'REPEAT:' | \
    awk -F':' '{print $NF}' | \
    sort -rn | \
    head -1

Focus on the error with highest repeat count first.

Weekly QA Report

Generate report of most common failures:

#!/bin/bash
# Weekly test failure summary

echo "Most Common Test Failures (Last 7 Days)"
echo "========================================"

cat test-runs/*.log | \
    uniqseq --skip-chars 13 --annotate --quiet | \
    grep "DUPLICATE" | \
    grep -oE "seen [0-9]+ times" | \
    awk '{sum += ($2 + 1)} END {print "Total failures:", sum}'

# Top 10 most repeated
cat test-runs/*.log | \
    uniqseq --skip-chars 13 --annotate --quiet | \
    grep -B1 "DUPLICATE" | \
    grep "ERROR:" | \
    sort | \
    uniq -c | \
    sort -rn | \
    head -10

Correlation Analysis

Find errors that always occur together:

# Extract sequence patterns (window-size > 1)
uniqseq test-failures.log \
    --window-size 2 \
    --skip-chars 13 \
    --annotate | \
    grep -B2 "DUPLICATE" | \
    grep "ERROR:"

Shows which errors consistently appear together.

Advanced: Custom Metrics

Export repeat counts to monitoring:

#!/bin/bash
# Push error frequency metrics to Prometheus

uniqseq /var/log/errors.log --skip-chars 20 --annotate --quiet | \
    grep "DUPLICATE" | \
    grep -oE "seen [0-9]+" | \
    awk '{
        counts[$2]++
    } END {
        for (count in counts) {
            print "error_repeat_count{frequency=\"" count "\"} " counts[count]
        }
    }' | \
    curl --data-binary @- http://pushgateway:9091/metrics/job/error_analysis

See Also