Quantum simulators let you test circuits without waiting in hardware queues or paying per shot. But with six major simulators available for free, which should you use? Here's a practical breakdown.
The Quick Answer
| Simulator | Best for | Max qubits (CPU) | GPU? |
|---|---|---|---|
| Qiskit Aer | General purpose | ~30 (statevector) | ✅ AerCuda |
| Cirq Simulator | Cirq circuits, density matrix | ~25 | ❌ |
| PennyLane default.qubit | QML, gradients | ~20 | ✅ lightning.gpu |
| PennyLane lightning.qubit | Fast CPU simulation | ~30 | ✅ lightning.gpu |
| NVIDIA CUDA-Q | Large circuits, speed | ~34 single GPU | ✅ native |
| Braket LocalSimulator | Braket circuits, free | ~25 | ❌ |
Qiskit Aer: The Workhorse
Qiskit Aer is the most feature-complete free simulator. It supports multiple simulation methods:
- statevector_simulator: Exact simulation up to ~30 qubits. Memory scales as 2ⁿ complex numbers (16 GB RAM needed for 30 qubits).
- qasm_simulator: Shot-based sampling with noise model support
- density_matrix: Simulates mixed states and open quantum systems
- mps: Matrix Product State — simulates circuits with limited entanglement up to hundreds of qubits
- stabilizer: Clifford circuits only, but scales to thousands of qubits in polynomial time
from qiskit_aer import AerSimulator
# Default: automatic method selection
sim = AerSimulator()
# Force a specific method
sim_sv = AerSimulator(method='statevector')
sim_dm = AerSimulator(method='density_matrix')
sim_mps = AerSimulator(method='matrix_product_state')
When to use Aer: For any Qiskit workflow, noise modeling, or when you want to match IBM hardware behavior closely.
PennyLane: Best for Quantum ML
PennyLane shines when you need differentiable circuits. Its default.qubit simulator computes exact gradients via the parameter-shift rule, enabling gradient-based optimization of quantum circuits:
import pennylane as qml
import numpy as np
dev = qml.device("default.qubit", wires=2)
@qml.qnode(dev)
def circuit(theta):
qml.RX(theta[0], wires=0)
qml.RY(theta[1], wires=1)
qml.CNOT(wires=[0, 1])
return qml.expval(qml.PauliZ(0))
# Automatic gradient computation
grad_fn = qml.grad(circuit)
theta = np.array([0.5, 1.2])
print(grad_fn(theta)) # exact gradient
For faster simulation without gradients, use lightning.qubit (C++ backend, ~10× faster than default.qubit):
pip install pennylane-lightning
dev = qml.device("lightning.qubit", wires=20)
When to use PennyLane: Variational algorithms (VQE, QAOA), quantum ML, any workflow requiring circuit gradients.
NVIDIA CUDA-Q: When Speed Matters
CUDA-Q provides GPU-accelerated simulation that can be 100–10,000× faster than CPU simulators for large circuits. If you have any NVIDIA GPU, this is the right choice for 25+ qubit circuits:
import cudaq
@cudaq.kernel
def large_circuit(n: int):
qvec = cudaq.qvector(n)
h(qvec[0])
for i in range(n - 1):
cx(qvec[i], qvec[i + 1])
mz(qvec)
# Run on GPU (specify 'nvidia' target)
cudaq.set_target('nvidia')
counts = cudaq.sample(large_circuit, 30, shots_count=10000)
print(counts)
Performance benchmark (GHZ circuit, 28 qubits, 1000 shots):
- Qiskit Aer CPU: ~45 seconds
- PennyLane lightning.qubit: ~30 seconds
- CUDA-Q (A100 GPU): ~0.8 seconds
When to use CUDA-Q: Any circuit with 25+ qubits, performance-critical simulations, multi-GPU workloads.
Cirq: Noise Modeling and NISQ Research
Google Cirq includes three simulators:
import cirq
# Exact statevector simulation
sim = cirq.Simulator()
# Density matrix with noise
noise_model = cirq.ConstantQubitNoiseModel(
cirq.depolarize(p=0.01)
)
noisy_sim = cirq.DensityMatrixSimulator(noise=noise_model)
# Clifford circuits only (but exponentially faster for stabilizer states)
clifford_sim = cirq.CliffordSimulator()
When to use Cirq: NISQ noise modeling, stabilizer circuit research, Google AI Quantum workflows.
Amazon Braket LocalSimulator: Isolation and Portability
The Braket SDK includes a free local simulator that mirrors the cloud API exactly:
from braket.devices import LocalSimulator
from braket.circuits import Circuit
device = LocalSimulator()
circuit = Circuit()
circuit.h(0)
circuit.cnot(0, 1)
circuit.probability()
task = device.run(circuit, shots=1000)
result = task.result()
print(result.measurement_counts)
When to use Braket local: You're building for AWS deployment and want local dev/test that matches the cloud API.
The HLQuantum Shortcut
If you need to run the same circuit on multiple simulators for benchmarking or comparison, HLQuantum makes this trivial:
import hlquantum as hlq
import time
qc = hlq.Circuit(28)
qc.h(0)
for i in range(27):
qc.cx(i, i + 1)
qc.measure_all()
for backend in ["qiskit", "pennylane", "cudaq"]:
t0 = time.time()
result = hlq.run(qc, shots=1000, backend=backend)
print(f"{backend}: {time.time() - t0:.2f}s")
One circuit, three backends, directly comparable results.