241 Benchmarks

Executed in Nanoseconds

Sub-microsecond execution times across 241 operations. Real benchmarks from real hardware.

Last run: December 5, 2025 at 8:07 PM UTC

0.3 ns
Fastest Op
4.54 μs
Avg Compiler
1.76 μs
Avg Backend
7
Categories

Execution Performance

Average execution times across key operation categories.

4.5μs
220x faster than 1ms

Compiler Operations

Circuit canonicalization and compilation

1.8μs
567x faster than 1ms

Backend Execution

ISA instructions and Unit Group operations

Speed Comparison

Direct visual comparison against common reference times.

Compiler Operations

4.5μs

Circuit canonicalization and compilation

Your benchmark
4.5μs
1 millisecond
220x slower
100 microseconds
22x slower
10 microseconds
2x slower
1 microsecond

Backend Execution

1.8μs

ISA instructions and Unit Group operations

Your benchmark
1.8μs
1 millisecond
567x slower
100 microseconds
57x slower
10 microseconds
6x slower
1 microsecond

Throughput Capacity

Operations per second at average execution times. Higher throughput enables real-time processing at scale.

Compiler Operations

Circuit canonicalization and compilation

220.4Kops/sec
220.4K
1K
10K
100K
1M
10M
100M
1B
Lower throughputHigher throughput →
4.5μsaverage execution time

Backend Execution

ISA instructions and Unit Group operations

567.2Kops/sec
567.2K
1K
10K
100K
1M
10M
100M
1B
Lower throughputHigher throughput →
1.8μsaverage execution time

Speed in Context

Nanoseconds are abstract. Here's what they mean.

15,489×
Faster Than Thought

Human neural processing: ~13ms. Hologram O(1) lookup: ~839.3 ns.

357,440×
Lookups Per Blink

One blink (300ms) = 357,440 Hologram lookups.

59,573+
Per Heartbeat

Hologram lookups possible in a single 50ms heartbeat interval.

252m
Light Distance

Light travels 252m during one Hologram lookup.

Algorithmic Advantage

Hologram O(1) lookup vs traditional O(n²)/O(n³) approaches (logarithmic scale)

Hologram839.3 ns
O(n²) n=101.00 μs
O(n²) n=100100.00 μs
O(n²) n=100010.00 ms
O(n³) n=10010.00 ms

What We Benchmark

Comprehensive testing across all Hologram subsystems. Each category measures specific performance characteristics.

Compiler

18 benchmarks

Measures time to normalize circuits via 96-class geometric system and compile them for execution.

Fastest839.3 ns

Backends

32 benchmarks

Measures Unit Group inverse/multiply operations across precision types (i8, f16, f32, f64) and ISA instruction execution.

Fastest156.7 ns

Core Operations

43 benchmarks

Mathematical operations (SIMD, activations)

Fastest0.3 ns

Buffer Operations

17 benchmarks

Buffer allocation and memory operations

Fastest4.5 ns

Tensor Operations

2 benchmarks

Tensor operations (reshape, transpose, slice)

Fastest617.55 μs

End-to-End

11 benchmarks

End-to-end workflows (neural networks, image processing)

Fastest10.3 ns

Other

118 benchmarks

Cache analysis, character products, orbit operations, address computation, and type dispatch benchmarks.

Fastest0.3 ns

Performance Visualized

Side-by-side comparison of execution times. Shorter bars = faster operations.

Compiler Operations

Circuit canonicalization and compilation via the 96-class geometric system. These operations enable O(1) lookup times for circuit transformations.

Compiler Benchmarks

Canonicalize Simple Add Circuit
839.3 ns
Canonicalize Range Operations (4)
1.01 μs
Canonicalize Range Operations (8)
1.03 μs
Canonicalize Range Operations (16)
1.03 μs
Canonicalize H Squared Pattern
2.39 μs
Canonicalize Complex Circuit
2.54 μs
Compile Range Operations (8)
3.36 μs
Compile Range Operations (16)
3.37 μs
Compile Range Operations (4)
3.42 μs
Verify Canonicalization Idempotence
4.13 μs
Compile Sequential Chain (2)
4.48 μs
Compile Simple Merge
4.51 μs
Compile Scheduling (compile_standard)
5.04 μs
Compile Sequential Chain (4)
7.14 μs
Compile With Transforms
7.83 μs
Compile Parallel Marks
7.98 μs
03.99 μs7.98 μs

Backend Execution

ISA instruction execution and Unit Group operations across multiple precision types (i8, f16, f32, f64). These form the computational foundation for Hologram.

Backend Benchmarks

Unit Group Inverse (i8)
156.7 ns
Unit Group Inverse (f16_i16)
156.7 ns
Unit Group Inverse (f64_i64)
156.9 ns
Unit Group Inverse (f32_i32)
158.1 ns
Unit Group Operations (inverse_f16_i16)
159.2 ns
Unit Group Operations (inverse_i8)
159.2 ns
Unit Group Operations (inverse_f64_i64)
159.3 ns
Unit Group Operations (inverse_f32_i32)
159.3 ns
Unit Group Is Unit (f16_i16)
544.1 ns
Unit Group Operations (is_unit_f16_i16)
546.8 ns
Unit Group Is Unit (i8)
548.4 ns
Unit Group Is Unit (f32_i32)
549.0 ns
Unit Group Is Unit (f64_i64)
550.4 ns
Unit Group Operations (is_unit_f64_i64)
550.7 ns
Unit Group Operations (is_unit_i8)
551.2 ns
Unit Group Operations (is_unit_f32_i32)
552.5 ns
0276.2 ns552.5 ns

All Benchmarks

Detailed statistics by category. Click any card for confidence intervals and throughput data.

Compiler

Circuit compilation and canonicalization

Canonicalize Simple Add Circuitcompiler
839.3 ns
Canonicalize Range Operations (4)compiler
1.01 μs
Canonicalize Range Operations (8)compiler
1.03 μs
Canonicalize Range Operations (16)compiler
1.03 μs
Canonicalize H Squared Patterncompiler
2.39 μs
Canonicalize Complex Circuitcompiler
2.54 μs
Compile Range Operations (8)compiler
3.36 μs
Compile Range Operations (16)compiler
3.37 μs
Compile Range Operations (4)compiler
3.42 μs
Verify Canonicalization Idempotencecompiler
4.13 μs
Compile Sequential Chain (2)compiler
4.48 μs
Compile Simple Mergecompiler
4.51 μs
+6more benchmarks

Backends

Backend execution (ISA, Atlas, Unit Group)

Unit Group Inverse (i8)backends
156.7 ns
Unit Group Inverse (f16_i16)backends
156.7 ns
Unit Group Inverse (f64_i64)backends
156.9 ns
Unit Group Inverse (f32_i32)backends
158.1 ns
Unit Group Operations (inverse_f16_i16)backends
159.2 ns
Unit Group Operations (inverse_i8)backends
159.2 ns
Unit Group Operations (inverse_f64_i64)backends
159.3 ns
Unit Group Operations (inverse_f32_i32)backends
159.3 ns
Unit Group Is Unit (f16_i16)backends
544.1 ns
Unit Group Operations (is_unit_f16_i16)backends
546.8 ns
Unit Group Is Unit (i8)backends
548.4 ns
Unit Group Is Unit (f32_i32)backends
549.0 ns
+20more benchmarks

Core Operations

Mathematical operations (SIMD, activations)

Griess Vector Properties (len)core/ops
0.3 ns
196.9K
Griess Vector Properties (is_empty)core/ops
0.3 ns
196.9K
Griess Memory (as_slice)core/ops
0.6 ns
1.5 MB
Griess Vector Properties (is_zero)core/ops
0.8 ns
196.9K
Griess Vector Properties (is_near_identity)core/ops
0.8 ns
196.9K
Griess Memory (clone_arc)core/ops
4.4 ns
1.5 MB
Ops Comparison 16k (reduce_max)core/ops
730.3 ns
16.4K
Ops Comparison 16k (reduce_min)core/ops
730.4 ns
16.4K
Ops Comparison 16k (neg)core/ops
1.47 μs
16.4K
Ops Comparison 16k (abs)core/ops
1.47 μs
16.4K
Ops Comparison 16k (relu)core/ops
1.47 μs
16.4K
Ops Comparison 16k (reduce_sum)core/ops
2.00 μs
16.4K
+31more benchmarks

Buffer Operations

Buffer allocation and memory operations

Buffer Clone (1024)core/buffer
4.5 ns
1.0K
Buffer Clone (4096)core/buffer
4.5 ns
4.1K
Buffer Copy From Slice (1024)core/buffer
178.1 ns
4.0 KB
Buffer Allocation (256)core/buffer
453.4 ns
256
Buffer Canonicalize (1024)core/buffer
591.4 ns
4.0 KB
Buffer Copy From Slice (16384)core/buffer
1.61 μs
64.0 KB
Buffer To Vec (1024)core/buffer
1.73 μs
4.0 KB
Buffer Copy To Slice (1024)core/buffer
1.78 μs
4.0 KB
Buffer Planning (without_buffer_reuse)core/buffer
2.09 μs
Buffer Planning (with_buffer_reuse)core/buffer
2.44 μs
Buffer Allocation (4096)core/buffer
2.84 μs
4.1K
Large Buffer Chain (16384)core/buffer
5.83 μs
16.4K
+5more benchmarks

Tensor Operations

Tensor operations (reshape, transpose, slice)

Lazy Tensor Wrapper (lazy_tensor_chain)core/tensor
617.55 μs
16.4K
Lazy Tensor Wrapper (lazy_op_chain)core/tensor
656.52 μs
16.4K

End-to-End

End-to-end workflows (neural networks, image processing)

Dispatch Overhead (single_op)e2e
10.3 ns
Neural Network Layer (64)e2e
27.7 ns
4.2K
Dispatch Overhead (three_op_chain)e2e
28.6 ns
Attention Forward (64)e2e
30.1 ns
4.1K
Neural Network Layer (128)e2e
42.0 ns
16.5K
Attention Forward (128)e2e
43.2 ns
16.4K
Attention Forward (256)e2e
60.5 ns
65.5K
Neural Network Layer (256)e2e
60.7 ns
65.8K
Statistics Pipeline (1024)e2e
423.6 ns
1.0K
Statistics Pipeline (4096)e2e
1.03 μs
4.1K
Statistics Pipeline (16384)e2e
3.48 μs
16.4K

Other

Uncategorized benchmarks

Orbit Operations (single_orbit_classify)other
0.3 ns
Resonance Ops (crush)other
0.3 ns
Character Products (single_character_product)other
1.5 ns
Address Computation (f32_coordinate_to_offset)other
2.6 ns
Address Computation (f64_coordinate_to_offset)other
2.7 ns
Resonance Ops (lift_single)other
6.3 ns
Orbit Operations (orbit_representative_lookup)other
22.5 ns
Orbit Operations (orbit_class_size)other
30.6 ns
Activation Relu (1024)other
41.3 ns
1.0K
Scalar Mul (1024)other
41.5 ns
1.0K
Scalar Add (1024)other
41.5 ns
1.0K
Abs (1024)other
43.5 ns
1.0K
+106more benchmarks

Methodology

Transparent, reproducible, verifiable. All benchmarks run with criterion.rs using 100 sample iterations with 10 warmup iterations excluded.

December 5, 2025
8443e1d

See For Yourself

Clone the repo. Run the benchmarks. Verify the numbers. The speed is real.