QAOA Explained: Solving Max-Cut Step by Step

QAOA (Quantum Approximate Optimization Algorithm) is one of the most hyped near-term quantum algorithms — and for good reason. It's a genuine candidate for quantum advantage on combinatorial optimization problems, it runs on today's NISQ hardware, and it's approachable enough to implement in an afternoon.

This guide builds the full QAOA pipeline from scratch, using Max-Cut as the example problem. By the end you'll understand what QAOA actually does, how to implement it, and when it's worth using.

What is Max-Cut?

Max-Cut is a graph partitioning problem: given a graph, divide the nodes into two groups (0 and 1) to maximize the number of edges that cross between groups. It's NP-hard in general, appears in VLSI chip design, clustering, and network analysis, and maps cleanly onto quantum hardware.

Here's a simple 4-node graph:

0 --- 1
|   \ |
3 --- 2

The maximum cut of this graph is 4 (edges 0-1, 1-2, 2-3, 3-0 all cross). Our goal: find the bit string 0101 or 1010 that achieves this.

The QAOA Idea

QAOA is a variational hybrid algorithm. It alternates two parameterized operations:

Problem unitary U_C(γ) — encodes the cost function (edge crossings) as phase kickback
Mixer unitary U_B(β) — mixes between solutions (X rotations)

With p layers of these alternating unitaries, QAOA prepares a quantum state whose measurement is likely a good solution. A classical optimizer tunes the 2p angles (γ, β) to maximize expected cut value.

Building QAOA with PennyLane

import pennylane as qml
import numpy as np
from scipy.optimize import minimize

# Graph edges
edges = [(0, 1), (1, 2), (2, 3), (3, 0)]
n_qubits = 4

dev = qml.device("default.qubit", wires=n_qubits)

def cost_unitary(gamma):
    """Encode the Max-Cut cost into phases."""
    for u, v in edges:
        qml.CNOT(wires=[u, v])
        qml.RZ(2 * gamma, wires=v)
        qml.CNOT(wires=[u, v])

def mixer_unitary(beta):
    """Apply X-rotation mixer to all qubits."""
    for i in range(n_qubits):
        qml.RX(2 * beta, wires=i)

@qml.qnode(dev)
def qaoa_circuit(params, p=1):
    gammas = params[:p]
    betas = params[p:]
    # Start in uniform superposition
    for i in range(n_qubits):
        qml.Hadamard(wires=i)
    # Alternate p layers
    for layer in range(p):
        cost_unitary(gammas[layer])
        mixer_unitary(betas[layer])
    return qml.probs(wires=range(n_qubits))

def expected_cut(params, p=1):
    """Classical cost function: expected number of cut edges."""
    probs = qaoa_circuit(params, p)
    total = 0.0
    for bitstring_idx, prob in enumerate(probs):
        bits = format(bitstring_idx, f'0{n_qubits}b')
        cut = sum(1 for u, v in edges if bits[u] != bits[v])
        total += prob * cut
    return -total  # Minimize negative cut

# Optimize p=1 QAOA
p = 1
init_params = np.random.uniform(0, np.pi, 2 * p)
result = minimize(expected_cut, init_params, method='COBYLA',
                  options={'maxiter': 200})

print(f"Optimized params: {result.x}")
print(f"Expected cut: {-result.fun:.3f} / 4.0 maximum")

Running this typically achieves expected_cut ≈ 3.5 with p=1 — 87.5% of the maximum cut.

Sampling the Best Solution

After optimization, sample the circuit to find the most likely good solutions:

@qml.qnode(dev)
def sample_circuit(params, p=1, shots=1000):
    gammas = params[:p]
    betas = params[p:]
    for i in range(n_qubits):
        qml.Hadamard(wires=i)
    for layer in range(p):
        cost_unitary(gammas[layer])
        mixer_unitary(betas[layer])
    return qml.sample(wires=range(n_qubits))

samples = sample_circuit(result.x, shots=1000)

# Count cut values for each bitstring
from collections import Counter
cut_counts = Counter()
for bits in samples:
    key = ''.join(str(b) for b in bits)
    cut = sum(1 for u, v in edges if bits[u] != bits[v])
    cut_counts[(key, cut)] += 1

# Show top 5 most frequent bitstrings
for (bits, cut), count in sorted(cut_counts.items(), key=lambda x: -x[1])[:5]:
    print(f"{bits}  cut={cut}  frequency={count/10:.1f}%")

Expected output:

0101  cut=4  frequency=28.3%
1010  cut=4  frequency=27.1%
0110  cut=3  frequency=8.4%
...

The optimal solutions (0101 and 1010) dominate — QAOA amplified their probability.

The Same Problem with HLQuantum

HLQuantum's built-in QAOA lets you skip the circuit construction entirely:

import hlquantum as hlq

graph = [(0, 1), (1, 2), (2, 3), (3, 0)]

result = hlq.algorithms.qaoa(
    graph=graph,
    problem='max_cut',
    p=2,
    backend='pennylane',
    shots=2000,
)

print(result.best_solution)   # '0101' or '1010'
print(result.cut_value)       # 4
print(result.approximation_ratio)  # ~0.96 for p=2

Switch backends with a single flag:

# Run on Qiskit Aer
result_qiskit = hlq.algorithms.qaoa(graph=graph, problem='max_cut', p=2, backend='qiskit')

# Run on real IonQ hardware (if you have credits)
result_ionq = hlq.algorithms.qaoa(graph=graph, problem='max_cut', p=2,
                                   backend='ionq', device='aria-1')

Increasing p for Better Results

The QAOA approximation ratio improves with depth p. At p→∞, QAOA solves the problem exactly. In practice:

p	Typical approx. ratio (Max-Cut)	Gates required
1	~0.75	2×edges + 4n
2	~0.86	4×edges + 8n
3	~0.91	6×edges + 12n
5	~0.95	10×edges + 20n

The tradeoff: deeper circuits accumulate more noise on real hardware. On simulators, use p=3 or higher. On NISQ QPUs, p=1 or p=2 is usually the practical limit.

When is QAOA Worth Using?

Currently useful for:

Benchmarking NISQ devices (QAOA is a standard benchmark circuit)
Academic research on variational quantum algorithms
Small graphs (under 20 nodes) on simulators

Not yet useful for:

Production optimization on real hardware at scale — classical heuristics (simulated annealing, tabu search) still outperform QAOA on practical problems
Large graphs — circuit depth scales with graph density

The promise: On future fault-tolerant hardware with high-depth circuits, QAOA at large p could provide genuine speedups for dense NP-hard optimization. The theoretical foundations are solid; the hardware isn't there yet.

Next Steps

PennyLane SDK guide — set up PennyLane for more variational algorithms
VQE guide — the sister algorithm for quantum chemistry
HLQuantum — run QAOA on any backend with one API