Test System

Terminology: This specification uses RFC 2119 keywords (MUST, SHOULD, MAY, etc.) to indicate requirement levels.

This document describes the reference test system in Structyl.

Overview

Structyl provides a language-agnostic reference test system. Test data is stored in JSON format and shared across all language implementations, ensuring consistent behavior.

Non-Goals

The reference test system does NOT provide:

Perceptual/fuzzy binary comparison — Binary outputs are compared byte-for-byte exactly; no image similarity or fuzzy matching
Test mutation or fuzzing — Test cases are static JSON files; mutation testing is out of scope
Coverage measurement — Coverage is delegated to language-specific tooling
Test generation — Structyl does not mandate or provide test generation tools
Parallel test execution — Parallelism is at the target level, not individual test case level

Test Data Format

Basic Structure

Every test file has input and output:

json

{
  "input": {
    "x": [1.0, 2.0, 3.0, 4.0, 5.0]
  },
  "output": 3.0
}

Test Case Schema

Field	Required	Type	Description
`input`	Yes	object	Input parameters for the function under test
`output`	Yes	any (not null)	Expected output value; null is invalid (see Loading Failure Behavior)
`description`	No	string	Optional documentation for the test case
`skip`	No	boolean	When `true`, marks the test as skipped
`tags`	No	string[]	Optional categorization for filtering or grouping

Canonical Identifier: The canonical identifier for a test case is {suite}/{name} (e.g., math/addition). When suite is empty, the identifier is just the name. The forward slash (/) separator is used on all platforms for cross-platform consistency. In pkg/testhelper, use TestCase.ID() to obtain this identifier.

Validation Rules:

Missing input field: Load fails with test case {suite}/{name}: missing required field "input"
Missing output field: Load fails with test case {suite}/{name}: missing required field "output"
Any additional fields beyond those listed above are silently ignored (forward-compatibility)
Empty input object ({}) is valid

Tag usage: Tags have no built-in semantics in Structyl. Language implementations MAY use tags to filter test execution, group tests in output, or skip tests based on environment capabilities. Tag values are free-form strings; establish conventions per-project.

Tags validation: Tags are intentionally permissive: empty strings, duplicates, and any characters are allowed. This design avoids constraining downstream tooling. Establish per-project conventions for tag naming.

Reserved Field Names

The field names timeout, setup, and teardown are reserved for future specification versions. Users SHOULD NOT use these for custom metadata as they MAY gain normative semantics in future releases. These fields are currently ignored by all loaders.

Loading Failure Behavior

Test loading is all-or-nothing per suite:

Condition	Behavior	Exit Code
JSON parse error	Suite load fails	2
Missing required field (`input` or `output`)	Suite load fails	2
`output` field is explicit `null`	Suite load fails	2
Referenced `$file` not found	Suite load fails	2
Referenced `$file` path escapes suite directory (`../`)	Suite load fails	2

Loading failures are configuration errors (exit code 2), distinct from test execution failures (exit code 1). A loading failure prevents any tests in that suite from executing.

pkg/testhelper limitation: The public Go package uses *.json pattern (immediate directory only), not the recursive **/*.json pattern supported by Structyl's internal runner. See the Test Loader Implementation section.

API Differences: Internal vs Public

Capability	Internal Runner (`internal/tests`)	Public API (`pkg/testhelper`)
Glob patterns	`*/.json` (recursive)	`*.json` (immediate directory only)
`$file` references	Full support	`ErrFileReferenceNotSupported`
Binary data	Via file references	Embed as base64 or load separately

Users requiring recursive patterns or binary file references SHOULD use the internal package or implement project-specific loading. See pkg/testhelper Limitations for workarounds.

Error message format:

structyl: test suite "{suite}": {reason}
  file: {path}

Error Types (pkg/testhelper)

The pkg/testhelper package returns specific error types for programmatic handling:

Error Type	Sentinel	Condition
`ProjectNotFoundError`	`ErrProjectNotFound`	No `.structyl/config.json` found in ancestors
`SuiteNotFoundError`	`ErrSuiteNotFound`	Test suite directory does not exist
`TestCaseNotFoundError`	`ErrTestCaseNotFound`	Test case file does not exist
`InvalidSuiteNameError`	`ErrInvalidSuiteName`	Suite name contains `..`, `/`, `\`, or `\0`
`InvalidTestCaseNameError`	`ErrInvalidTestCaseName`	Test case name contains `..`, `/`, `\`, or `\0`

The InvalidSuiteNameError and InvalidTestCaseNameError types include a Reason field indicating why the name was rejected:

Constant	Value	Meaning
`ReasonPathTraversal`	`"path_traversal"`	Name contains `..` sequence
`ReasonPathSeparator`	`"path_separator"`	Name contains `/` or `\`
`ReasonNullByte`	`"null_byte"`	Name contains null byte (`\0`)

Additional sentinels (no struct type):

Sentinel	Condition
`ErrEmptySuiteName`	Empty suite name provided
`ErrEmptyTestCaseName`	Empty test case name provided
`ErrFileReferenceNotSupported`	`$file` reference in test case

Usage:

tc, err := testhelper.LoadTestCase(path)
if err != nil {
    if errors.Is(err, testhelper.ErrTestCaseNotFound) {
        // Handle missing file
    }
    var suiteErr *testhelper.SuiteNotFoundError
    if errors.As(err, &suiteErr) {
        // Access suiteErr.Suite for context
    }
}

Input Structure

Input MUST be a JSON object (map). The object MAY be empty ({}). Scalar values and arrays as the top-level input MUST NOT be used; the loader MUST reject such inputs with exit code 2 (Configuration Error).

Within the input object, values can be:

Scalar values: numbers, strings, booleans, null
Arrays: [1.0, 2.0, 3.0]
Nested objects: {"config": {"alpha": 0.05}}

json

{
  "input": {
    "x": [1.0, 2.0, 3.0],
    "y": [4.0, 5.0, 6.0],
    "alpha": 0.05
  },
  "output": 2.5
}

Why object-only? Test inputs represent named parameters. Objects provide named access and align with how most test frameworks structure input.

Output Types

Outputs can be:

Scalar: "output": 42
Array: "output": [1, 2, 3]
Object: "output": {"lower": -4, "upper": 0}

json

{
  "input": {
    "x": [1, 2, 3, 4, 5],
    "y": [3, 4, 5, 6, 7],
    "misrate": 0.05
  },
  "output": {
    "lower": -4,
    "upper": 0
  }
}

Binary Data References (Internal Only)

Public API Limitation

The $file reference syntax is only available in Structyl's internal test runner (internal/tests package). The public Go package pkg/testhelper does NOT support this syntax and rejects ANY $file reference with ErrFileReferenceNotSupported. External implementations MUST either embed binary data directly in JSON or use Structyl's internal package.

The validation rules in the table below apply only to the internal runner. The public testhelper package rejects all $file references unconditionally.

For projects using the internal runner, binary data can be referenced via the $file syntax:

json

{
  "input": {
    "data": { "$file": "input.bin" },
    "format": "raw"
  },
  "output": { "$file": "expected.bin" }
}

File Reference Schema

A file reference is a JSON object with exactly one key $file:

json

{ "$file": "<relative-path>" }

Validation rules (internal runner only):

The object MUST have exactly one key: $file
The value MUST be a non-empty string
Objects with $file and other keys are invalid

Example	Valid	Reason
`{"$file": "input.bin"}`	✓	Correct format
`{"$file": "data/input.bin"}`	✓	Subdirectory allowed
`{"$file": ""}`	✗	Empty path
`{"$file": "../input.bin"}`	✗	Parent reference not allowed
`{"$file": "/etc/passwd"}`	✗	Absolute paths not allowed
`{"$file": "x.bin", "extra": 1}`	✗	Extra keys not allowed
`{"FILE": "input.bin"}`	✗	Wrong key (case-sensitive)

Implementation note: The validation table above describes semantics for Structyl's internal runner. The public pkg/testhelper package rejects ANY $file reference regardless of object structure—see the warning box in Binary Data References.

Path Resolution: Paths in $file references are resolved relative to the directory containing the JSON test file.

Example:

Test file: tests/image-processing/resize-test.json
Reference: {"$file": "input.bin"}
Resolved path: tests/image-processing/input.bin

Subdirectory references are permitted:

Reference: {"$file": "data/input.bin"}
Resolved path: tests/image-processing/data/input.bin

Parent directory references (../) and absolute paths (starting with / on Unix or drive letters on Windows) are NOT permitted and will cause a load error. Only relative paths within the suite directory are valid.

Symlink handling: Symlinks are followed during resolution. However, if the resolved target path is outside the suite directory, the reference MUST be rejected.

Path separator normalization: Use forward slashes (/) in $file references for cross-platform portability. Implementations SHOULD normalize path separators internally.

Binary files are stored alongside the JSON file:

tests/
└── image-processing/
    ├── resize-test.json
    ├── input.bin
    └── expected.bin

Binary Output Comparison

Binary outputs (referenced via $file) are compared byte-for-byte exactly:

No byte order normalization (files MUST use consistent endianness)
No line ending normalization (CRLF and LF are distinct bytes)
No encoding normalization (UTF-8 BOM presence is significant)
No tolerance is applied to binary data

For outputs requiring approximate comparison (e.g., images with compression artifacts), test authors MUST either:

Use deterministic output formats (e.g., uncompressed BMP instead of JPEG)
Pre-process outputs to a canonical form before comparison
Extract comparable numeric values into the JSON output field instead

Structyl does not provide perceptual or fuzzy binary comparison.

Test Discovery

Algorithm

Find project root: Walk up from CWD until .structyl/config.json found
Locate test directory: {root}/{tests.directory}/ (default: tests/)
Discover suites: Immediate subdirectories of test directory
Load test cases: Files matching tests.pattern (default: **/*.json)

Glob Pattern Syntax

The tests.pattern field supports a simplified subset of glob syntax:

Pattern	Matches
`*`	Any sequence of non-separator characters
`*/.json`	All `.json` files recursively (simplified)

Simplified Pattern Matching

The internal test loader uses a simplified pattern matching implementation, not a full glob library. The **/*.json pattern recursively finds all .json files but does not support full globstar semantics (e.g., intermediate directory matching like foo/**/bar). For most test organization patterns, this is sufficient.

Examples:

**/*.json - All JSON files in any subdirectory (default)
*.json - JSON files matching standard glob on filename only

Directory Structure

tests/
├── center/                    # Suite: "center"
│   ├── demo-1.json           # Case: "demo-1"
│   ├── demo-2.json           # Case: "demo-2"
│   └── edge-case.json        # Case: "edge-case"
├── shift/                     # Suite: "shift"
│   └── ...
└── shift-bounds/              # Suite: "shift-bounds"
    └── ...

Naming Conventions

Suite names: lowercase, hyphens allowed (e.g., shift-bounds)
Test names: lowercase, hyphens allowed (e.g., demo-1)
No spaces: Use hyphens instead

Output Comparison

Floating Point Tolerance

Configure tolerance in .structyl/config.json:

json

{
  "tests": {
    "comparison": {
      "float_tolerance": 1e-9,
      "tolerance_mode": "relative"
    }
  }
}

Tolerance Modes

Mode	Formula	Use Case
`absolute`	\|expected − actual\| ≤ tolerance	Small values
`relative`	\|expected − actual\| / \|expected\| ≤ tolerance	General purpose
`ulp`	ULP difference ≤ tolerance	IEEE precision

Note: For relative mode, when expected is exactly 0.0, the formula changes to |actual| <= tolerance to avoid division by zero.

Numeric Type Handling

When comparing expected and actual values:

JSON-decoded numbers are float64 per Go's encoding/json semantics
Programmatically-constructed int actual values are converted to float64 before comparison
This enables test assertions with integer literals: Equal(1.0, myIntResult, opts) works even if myIntResult is int

This accommodation only applies to the actual parameter; expected values from JSON are always float64.

Array Comparison

json

{
  "tests": {
    "comparison": {
      "array_order": "strict"
    }
  }
}

Mode	Behavior
`strict`	Order matters, element-by-element comparison
`unordered`	Order doesn't matter (multiset comparison); array lengths MUST match, duplicates are counted

Special Floating Point Values

JSON cannot represent NaN or Infinity directly. Structyl uses special string values as placeholders.

Case Sensitivity

Special float strings are matched exactly. Only these exact strings trigger special handling:

"NaN" — not "nan", "NAN", or "Nan"
"Infinity" or "+Infinity" — not "infinity" or "INFINITY"
"-Infinity" — not "-infinity"

Lowercase or other variants are treated as regular strings, not special float values.

JSON representation:

Value	JSON String
Positive infinity	`"Infinity"` or `"+Infinity"`
Negative infinity	`"-Infinity"`
Not a Number	`"NaN"`

Example:

json

{
  "input": { "x": [1.0, "Infinity", "-Infinity"] },
  "output": "NaN"
}

Configuration:

json

{
  "tests": {
    "comparison": {
      "nan_equals_nan": true
    }
  }
}

Comparison behavior for IEEE 754 special values:

Comparison	Result
`NaN == NaN`	`true` (configurable via `nan_equals_nan: false`)
`+Infinity == +Infinity`	`true`
`-Infinity == -Infinity`	`true`
`+Infinity == -Infinity`	`false`
`-0.0 == +0.0`	`true`

Test Loader Implementation

Informative Section

This section is informative only. The code examples illustrate one possible implementation approach. Conforming implementations MAY use different designs, APIs, or patterns as long as they satisfy the functional requirements.

pkg/testhelper Limitations

WARNING

The public Go pkg/testhelper package has the following limitations compared to Structyl's internal test runner:

No $file references: File reference resolution is only available in the internal runner. Test cases using $file syntax SHOULD either use the internal/tests package or embed data directly in JSON.
No recursive glob patterns: LoadTestSuite uses filepath.Glob("*.json") which matches JSON files in the immediate suite directory only. The tests.pattern configuration setting (which supports ** recursive patterns) is only used by Structyl's internal runner. To load nested test files with pkg/testhelper, iterate subdirectories manually.

Alternative approaches for binary test data:

Base64 encoding: Embed binary data as base64-encoded strings in JSON and decode in your test setup
Separate loading: Implement project-specific file loading alongside pkg/testhelper that reads binary files directly from the suite directory
Pre-compute comparisons: For binary output verification, compute checksums (SHA256) in JSON expected output and compare hashes instead of raw bytes

Deprecated Functions

See Stability Policy — Current Deprecations for the complete list of deprecated pkg/testhelper functions and constants with their replacements and removal timeline.

Thread Safety

All loader and comparison functions in pkg/testhelper are safe for concurrent use:

Loader functions (LoadTestSuite, LoadTestCase, etc.) perform read-only filesystem operations and can be called concurrently.
Comparison functions (Equal, Compare, FormatComparisonResult) are pure functions with no shared state.
The TestCase type is safe to read concurrently, but callers MUST NOT modify a TestCase while other goroutines are reading it.

Copy Semantics Warning

Shallow Copy Behavior

TestCase.Clone() and all With* builder methods perform shallow copies. The Output field is NOT copied—both original and clone share the same reference. Modifying Output on a clone also modifies the original:

clone := original.Clone()
clone.Output.(map[string]interface{})["key"] = "changed"
// Surprise: original.Output["key"] is also changed!

Additionally, while Input is shallow-copied at the top level, nested values within Input are shared. Modifying nested Input values affects the original.

Use TestCase.DeepClone() when you need to modify Output or nested Input values independently.

TestCase Validation Methods

The TestCase type provides three validation methods forming a hierarchy of increasing strictness:

Method	Checks	Use Case
`Validate()`	Name, Input, Output non-nil	Basic structural validation
`ValidateStrict()`	Above + top-level Output type	Programmatic TestCase construction
`ValidateDeep()`	Above + recursive type validation for all values	Complex nested structures

When to use each method:

Validate(): Default choice for programmatically-created TestCase instances. Ensures required fields are present.
ValidateStrict(): Use when Output comes from non-JSON sources (e.g., computed values) to verify the top-level type is JSON-compatible.
ValidateDeep(): Use for deeply nested structures to ensure all values (including nested maps and arrays) contain only JSON-compatible types.

Loader functions (LoadTestCase, LoadTestSuite) already validate these requirements, so calling validation methods after loading is unnecessary.

// Programmatic TestCase creation
tc := testhelper.TestCase{
    Name:   "my-test",
    Input:  map[string]interface{}{"x": 1.0},
    Output: map[string]interface{}{"result": 2.0},
}

// Choose validation level based on needs
if err := tc.Validate(); err != nil {      // structural only
    return err
}
if err := tc.ValidateDeep(); err != nil {  // recursive type check
    return err
}

Panic Behavior

The comparison functions (Equal, Compare, FormatComparisonResult) panic on invalid CompareOptions:

Condition	Panic
`ToleranceMode` not in `{"", "relative", "absolute", "ulp"}`	Yes
`ArrayOrder` not in `{"", "strict", "unordered"}`	Yes
`FloatTolerance` < 0	Yes
`ToleranceMode == "ulp"` and `FloatTolerance > math.MaxInt64`	Yes

This design treats invalid options as programmer errors (fail-fast) rather than runtime conditions.

Stability Note: Panic message format is unstable and MAY change between versions. See stability.md for details. For user-provided options, use one of these approaches:

Validate before comparison: Call ValidateOptions(opts) first; if it returns nil, subsequent comparison calls will not panic
Use error-returning variants: EqualE, CompareE, and FormatComparisonResultE return errors instead of panicking

// Option 1: Validate upfront
if err := testhelper.ValidateOptions(opts); err != nil {
    return fmt.Errorf("invalid options: %w", err)
}
result := testhelper.Equal(expected, actual, opts)  // safe

// Option 2: Error-returning variant
result, err := testhelper.EqualE(expected, actual, opts)
if err != nil {
    return fmt.Errorf("invalid options: %w", err)
}

Each language MUST implement a test loader. Required functionality:

Locate project root via marker file traversal
Discover test suites by scanning test directory
Load JSON files and deserialize to native types
Compare outputs with appropriate tolerance

Example: Go Test Loader

Public API vs Internal Implementation

The example below is illustrative. For the actual public Go API, see the pkg/testhelper package. For internal implementation with full glob support and $file resolution, see internal/tests.

// Illustrative pseudocode - see pkg/testhelper for actual API
package testhelper

import (
    "encoding/json"
    "path/filepath"
)

type TestCase struct {
    Name   string
    Suite  string
    Input  map[string]interface{}
    Output interface{}
}

func LoadTestSuite(projectRoot, suite string) ([]TestCase, error) {
    pattern := filepath.Join(projectRoot, "tests", suite, "*.json")
    files, err := filepath.Glob(pattern)
    if err != nil {
        return nil, err
    }

    var cases []TestCase
    for _, f := range files {
        tc := loadTestCase(f)
        tc.Suite = suite
        cases = append(cases, tc)
    }
    return cases, nil
}

func Equal(expected, actual interface{}, opts CompareOptions) bool {
    // Implementation with tolerance handling
}

pkg/testhelper Field Mapping

The actual pkg/testhelper.TestCase struct uses the following JSON field mapping:

JSON Field	Go Field	Go Type	Notes
`input`	`Input`	`map[string]interface{}`	Required; must be a JSON object
`output`	`Output`	`interface{}`	Required; any non-null JSON value
`description`	`Description`	`string`	Optional
`skip`	`Skip`	`bool`	Optional; defaults to `false`
`tags`	`Tags`	`[]string`	Optional
—	`Name`	`string`	Set by loader from filename (not in JSON)
—	`Suite`	`string`	Set by loader from directory (not in JSON)

The Name and Suite fields have json:"-" tags and are populated by the loader functions, not deserialized from JSON. Use LoadTestSuite or LoadTestCaseWithSuite to automatically populate the Suite field.

Example: Python Test Loader

python

import json
from pathlib import Path

def load_test_suite(project_root: Path, suite: str) -> list[dict]:
    suite_dir = project_root / "tests" / suite
    cases = []
    for f in suite_dir.glob("*.json"):
        with open(f) as fp:
            data = json.load(fp)
            data["name"] = f.stem
            data["suite"] = suite
            cases.append(data)
    return cases

def compare_output(expected, actual, tolerance=1e-9) -> bool:
    # Implementation with tolerance handling
    pass

Configuration

json

{
  "tests": {
    "directory": "tests",
    "pattern": "**/*.json",
    "comparison": {
      "float_tolerance": 1e-9,
      "tolerance_mode": "relative",
      "array_order": "strict",
      "nan_equals_nan": true
    }
  }
}

Field	Default	Description
`directory`	`"tests"`	Test data directory
`pattern`	`"*/.json"`	Glob pattern for test files (internal runner only; `pkg/testhelper` uses `*.json`)
`comparison.float_tolerance`	`1e-9`	Numeric tolerance
`comparison.tolerance_mode`	`"relative"`	How tolerance is applied
`comparison.array_order`	`"strict"`	Array comparison mode
`comparison.nan_equals_nan`	`true`	NaN equality behavior

pkg/testhelper Limitation

See pkg/testhelper Limitations for pattern support differences between pkg/testhelper (immediate directory only) and the internal runner (recursive ** patterns).

Test Generation

Structyl does not mandate a specific test generation process. The following approach is RECOMMENDED:

Generate tests in a consistent language (e.g., the reference implementation)
Store generated JSON in tests/
Commit generated tests to version control
Re-generate when algorithms change

Example command (project-specific):

bash

structyl cs generate  # Project-specific test generation

Best Practices

Use descriptive test names: negative-values, edge-empty-array
Organize by functionality: One suite per function/feature
Include edge cases: Empty inputs, boundary values, special cases
Document expected precision: In suite README or comments
Version test data: Commit to git, review changes

Test System ​

Overview ​

Non-Goals ​

Test Data Format ​

Basic Structure ​

Test Case Schema ​

Loading Failure Behavior ​

API Differences: Internal vs Public ​

Error Types (pkg/testhelper) ​

Input Structure ​

Output Types ​

Binary Data References (Internal Only) ​

File Reference Schema ​

Binary Output Comparison ​

Test Discovery ​

Algorithm ​

Glob Pattern Syntax ​

Directory Structure ​

Naming Conventions ​

Output Comparison ​

Floating Point Tolerance ​

Tolerance Modes ​

Numeric Type Handling ​

Array Comparison ​

Special Floating Point Values ​

Test Loader Implementation ​

pkg/testhelper Limitations ​

Deprecated Functions ​

Thread Safety ​

Copy Semantics Warning ​

TestCase Validation Methods ​

Panic Behavior ​

Example: Go Test Loader ​

pkg/testhelper Field Mapping ​

Example: Python Test Loader ​

Configuration ​

Test Generation ​

Best Practices ​

Test System

Overview

Non-Goals

Test Data Format

Basic Structure

Test Case Schema

Loading Failure Behavior

API Differences: Internal vs Public

Error Types (pkg/testhelper)

Input Structure

Output Types

Binary Data References (Internal Only)

File Reference Schema

Binary Output Comparison

Test Discovery

Algorithm

Glob Pattern Syntax

Directory Structure

Naming Conventions

Output Comparison

Floating Point Tolerance

Tolerance Modes

Numeric Type Handling

Array Comparison

Special Floating Point Values

Test Loader Implementation

pkg/testhelper Limitations

Deprecated Functions

Thread Safety

Copy Semantics Warning

TestCase Validation Methods

Panic Behavior

Example: Go Test Loader

pkg/testhelper Field Mapping

Example: Python Test Loader

Configuration

Test Generation

Best Practices