DefiledAI Tools

INFERENCE PROFILER

Throughput, TTFT, bandwidth utilisation, and CPU offload analysis for any GPU + model combination.

Configuration
Profile Results
Throughput
3t/s
TTFT
20.16s
Model size42.0GB
KV cache (4K ctx)0.0GB
Total VRAM needed42.5GB
CPU offload20.4GB
BW utilisation0%
Analysis
⚠ 20.4GB overflows to RAM — ~3 tok/s (CPU-limited)
✗ Too slow for interactive use