Skip to main content

Rust Code Million Times Faster — How the Right Algorithm Beats Raw Language Speed

Everyone says “Rust is fast” — but that alone isn’t enough.
Raw language speed can be dwarfed by the choice of algorithm for the underlying problem.

In this post, we’ll benchmark five different Rust Fibonacci implementations —
from textbook recursion to fast doubling with big integers —
to demonstrate how algorithmic complexity shapes real-world performance.


Why Benchmark

Never trust assumptions about speed or complexity without benchmarking your actual use case. Even in a language like Rust, the difference between a naive and optimized algorithm can be millions of times faster.
This is the power of algorithmic thinking combined with Rust’s performance.

Note on Benchmarking Accuracy:
We use Criterion.rs for benchmarking, which leverages black_box to prevent compiler optimizations that could eliminate or inline computations, ensuring benchmarks measure true runtime costs.


Problem Definition

Given an integer n, calculate the n-th Fibonacci number:

F(0) = 0, F(1) = 1
F(n) = F(n-1) + F(n-2) for n > 1

In practice, constraints drastically affect which approach is appropriate:

  • Small n (≤ 40): recursion may suffice.
  • Large n (≥ 10⁷): naive recursion is impossible; need iterative or advanced methods.
  • Performance goals: is correctness enough, or do we require fastest possible?

Approaches

1. Recursive O(2ⁿ)

/// Naive recursive; Exponential time | O(2ⁿ)
pub fn fibo_1(n: u32) -> u32 {
match n {
0 => 0,
1 => 1,
n => fibo_1(n - 1) + fibo_1(n - 2),
}
}

Benchmark

nTime
1095 ns
2012.3 µs
301.40 ms
40184.4 ms
Why this fails

Every subproblem is recomputed many times. fib(4) computes fib(3) twice, fib(2) three times, etc. This causes exponential runtime explosion.

2. Memoized Recursion O(n)

/// Use cache to avoid recomputation | Linear | O(n) time & space
pub fn fibo_2(n: u32, cache: &mut Vec<u128>) -> u128 {
if cache[n as usize] != u128::MAX {
return cache[n as usize];
}
cache[n as usize] = fibo_2(n - 1, cache) + fibo_2(n - 2, cache);
cache[n as usize]
}

Benchmark

nTime
1057 ns
20107 ns
30222 ns
40319 ns
1000.99 µs

Big improvement — now n values into the thousands are feasible, but with some downsides::

  • Recursion overhead remains.
  • Cache initialization and maintenance cost exist.

3. Iterative O(n)

/// Fast, simple iteration using u128 | Linear | O(n)
pub fn fibo_3(n: u64) -> u128 {
if n < 2 { return n as u128; }
let (mut prev, mut curr) = (0u128, 1u128);
for _ in 2..=n {
let next = prev + curr;
prev = curr;
curr = next;
}
curr
}

Benchmark

nTime
5030 ns
6034 ns
7038 ns
8043 ns
9048 ns
10053 ns
Limitation

u128 overflows at around n ≈ 186. For larger n, BigUint or other big-integer types are necessary, with higher costs.

4. Big Integer O(n), scalable to huge n

/// Iterative (BigUint)
pub fn fibo_3_scaled(n: u64) -> BigUint {
if n < 2 { return n.into(); }
let (mut prev, mut curr) = (BigUint::ZERO, BigUint::ONE);
for _ in 2..=n {
let next = &prev + &curr;
prev = curr;
curr = next;
}
curr
}

Benchmark

nTime
10100 ns
1001.03 µs
100011.0 µs

While the algorithm remains O(n), each addition involves expensive big-integer operations. This slows down performance compared to fixed-size integer arithmetic.

5. Fast Doubling O(log n)

/// Logarithmic O(log n) using Fast Doubling
pub fn fibo_4(n: u64) -> BigUint {
fn fib_pair(n: u64) -> (BigUint, BigUint) {
if n == 0 {
(BigUint::ZERO, BigUint::ONE)
} else {
let (a, b) = fib_pair(n >> 1);
let two = BigUint::from(2u8);
let c = &a * (&b * &two - &a);
let d = &a * &a + &b * &b;
if n & 1 == 0 { (c, d) } else { (d.clone(), &c + &d) }
}
}
fib_pair(n).0
}

This method efficiently handles massive n values (millions or billions) — the runtime grows logarithmically with n, plus the cost of big-integer multiplications.

Benchmark

nTime
10269 ns
100577 ns
1000995 ns
10_0006.8 µs
1_000_0008.34 ms
10_000_000260 ms
100_000_0007 s

Summary

MethodComplexityn=40n=100n=1,000n=1,000,000
fibo_1O(2ⁿ)184.4 ms🚫🚫🚫
fibo_2O(n)319 ns0.99 µs🚫🚫
fibo_3O(n)199 ns53 ns🚫🚫
fibo_3_scaledO(n)100 ns1.03 µs11.0 µs🚫
fibo_4O(log n)269 ns577 ns995 ns8.34 ms

“–” indicates impractical runtime or integer overflow.

Takeaway

  1. Don’t trust raw language speed alone: algorithm complexity dominates runtime.
  2. Recursion without memoization grows exponentially and quickly becomes unusable.
  3. Iterative methods eliminate recursion overhead and scale much better.
  4. Big integer arithmetic is costly but necessary for very large Fibonacci numbers.
  5. Fast doubling (O(log n)) is the asymptotically fastest exact method and handles massive inputs.
  6. Always benchmark with your real input sizes and constraints before choosing an approach.

Impossible → Instant

Comparing runtimes for various inputs:

  • fibo_1(40) → 184.4 ms
  • fibo_3_scaled(40) → 100 ns
  • fibo_4(1,000,000) → 8.34 ms

This means the optimized implementation (fibo_3) is over 22,000× faster than the naive recursive method on the same input, and orders of magnitude faster even when computing a problem 25,000 times larger.

If we compare computing n=1,000,000 using naive recursion (impossible in practice) vs. fast doubling, the theoretical speedup exceeds 10²⁰× — effectively turning a computation that would take longer than the age of the universe into milliseconds.

note

When scale grows, algorithmic complexity consistently outperforms raw language speed — every time.

Reproduce & Explore

Want to dive deeper or run the benchmarks yourself? Check out the full source code and Criterion benchmarks here:

GitHub repo: https://github.com/0xsecaas/algs

Benchmark code: benches/fib_bench.rs

Feel free to clone, experiment, and customize!