QAOA (Quantum Approximate Optimization Algorithm) is one of the most hyped near-term quantum algorithms — and for good reason. It's a genuine candidate for quantum advantage on combinatorial optimization problems, it runs on today's NISQ hardware, and it's approachable enough to implement in an afternoon.
This guide builds the full QAOA pipeline from scratch, using Max-Cut as the example problem. By the end you'll understand what QAOA actually does, how to implement it, and when it's worth using.
What is Max-Cut?
Max-Cut is a graph partitioning problem: given a graph, divide the nodes into two groups (0 and 1) to maximize the number of edges that cross between groups. It's NP-hard in general, appears in VLSI chip design, clustering, and network analysis, and maps cleanly onto quantum hardware.
Here's a simple 4-node graph:
0 --- 1
| \ |
3 --- 2
The maximum cut of this graph is 4 (edges 0-1, 1-2, 2-3, 3-0 all cross). Our goal: find the bit string 0101 or 1010 that achieves this.
The QAOA Idea
QAOA is a variational hybrid algorithm. It alternates two parameterized operations:
- Problem unitary
U_C(γ)— encodes the cost function (edge crossings) as phase kickback - Mixer unitary
U_B(β)— mixes between solutions (X rotations)
With p layers of these alternating unitaries, QAOA prepares a quantum state whose measurement is likely a good solution. A classical optimizer tunes the 2p angles (γ, β) to maximize expected cut value.
Building QAOA with PennyLane
import pennylane as qml
import numpy as np
from scipy.optimize import minimize
# Graph edges
edges = [(0, 1), (1, 2), (2, 3), (3, 0)]
n_qubits = 4
dev = qml.device("default.qubit", wires=n_qubits)
def cost_unitary(gamma):
"""Encode the Max-Cut cost into phases."""
for u, v in edges:
qml.CNOT(wires=[u, v])
qml.RZ(2 * gamma, wires=v)
qml.CNOT(wires=[u, v])
def mixer_unitary(beta):
"""Apply X-rotation mixer to all qubits."""
for i in range(n_qubits):
qml.RX(2 * beta, wires=i)
@qml.qnode(dev)
def qaoa_circuit(params, p=1):
gammas = params[:p]
betas = params[p:]
# Start in uniform superposition
for i in range(n_qubits):
qml.Hadamard(wires=i)
# Alternate p layers
for layer in range(p):
cost_unitary(gammas[layer])
mixer_unitary(betas[layer])
return qml.probs(wires=range(n_qubits))
def expected_cut(params, p=1):
"""Classical cost function: expected number of cut edges."""
probs = qaoa_circuit(params, p)
total = 0.0
for bitstring_idx, prob in enumerate(probs):
bits = format(bitstring_idx, f'0{n_qubits}b')
cut = sum(1 for u, v in edges if bits[u] != bits[v])
total += prob * cut
return -total # Minimize negative cut
# Optimize p=1 QAOA
p = 1
init_params = np.random.uniform(0, np.pi, 2 * p)
result = minimize(expected_cut, init_params, method='COBYLA',
options={'maxiter': 200})
print(f"Optimized params: {result.x}")
print(f"Expected cut: {-result.fun:.3f} / 4.0 maximum")
Running this typically achieves expected_cut ≈ 3.5 with p=1 — 87.5% of the maximum cut.
Sampling the Best Solution
After optimization, sample the circuit to find the most likely good solutions:
@qml.qnode(dev)
def sample_circuit(params, p=1, shots=1000):
gammas = params[:p]
betas = params[p:]
for i in range(n_qubits):
qml.Hadamard(wires=i)
for layer in range(p):
cost_unitary(gammas[layer])
mixer_unitary(betas[layer])
return qml.sample(wires=range(n_qubits))
samples = sample_circuit(result.x, shots=1000)
# Count cut values for each bitstring
from collections import Counter
cut_counts = Counter()
for bits in samples:
key = ''.join(str(b) for b in bits)
cut = sum(1 for u, v in edges if bits[u] != bits[v])
cut_counts[(key, cut)] += 1
# Show top 5 most frequent bitstrings
for (bits, cut), count in sorted(cut_counts.items(), key=lambda x: -x[1])[:5]:
print(f"{bits} cut={cut} frequency={count/10:.1f}%")
Expected output:
0101 cut=4 frequency=28.3%
1010 cut=4 frequency=27.1%
0110 cut=3 frequency=8.4%
...
The optimal solutions (0101 and 1010) dominate — QAOA amplified their probability.
The Same Problem with HLQuantum
HLQuantum's built-in QAOA lets you skip the circuit construction entirely:
import hlquantum as hlq
graph = [(0, 1), (1, 2), (2, 3), (3, 0)]
result = hlq.algorithms.qaoa(
graph=graph,
problem='max_cut',
p=2,
backend='pennylane',
shots=2000,
)
print(result.best_solution) # '0101' or '1010'
print(result.cut_value) # 4
print(result.approximation_ratio) # ~0.96 for p=2
Switch backends with a single flag:
# Run on Qiskit Aer
result_qiskit = hlq.algorithms.qaoa(graph=graph, problem='max_cut', p=2, backend='qiskit')
# Run on real IonQ hardware (if you have credits)
result_ionq = hlq.algorithms.qaoa(graph=graph, problem='max_cut', p=2,
backend='ionq', device='aria-1')
Increasing p for Better Results
The QAOA approximation ratio improves with depth p. At p→∞, QAOA solves the problem exactly. In practice:
| p | Typical approx. ratio (Max-Cut) | Gates required |
|---|---|---|
| 1 | ~0.75 | 2×edges + 4n |
| 2 | ~0.86 | 4×edges + 8n |
| 3 | ~0.91 | 6×edges + 12n |
| 5 | ~0.95 | 10×edges + 20n |
The tradeoff: deeper circuits accumulate more noise on real hardware. On simulators, use p=3 or higher. On NISQ QPUs, p=1 or p=2 is usually the practical limit.
When is QAOA Worth Using?
Currently useful for:
- Benchmarking NISQ devices (QAOA is a standard benchmark circuit)
- Academic research on variational quantum algorithms
- Small graphs (under 20 nodes) on simulators
Not yet useful for:
- Production optimization on real hardware at scale — classical heuristics (simulated annealing, tabu search) still outperform QAOA on practical problems
- Large graphs — circuit depth scales with graph density
The promise: On future fault-tolerant hardware with high-depth circuits, QAOA at large p could provide genuine speedups for dense NP-hard optimization. The theoretical foundations are solid; the hardware isn't there yet.
Next Steps
- PennyLane SDK guide — set up PennyLane for more variational algorithms
- VQE guide — the sister algorithm for quantum chemistry
- HLQuantum — run QAOA on any backend with one API