DefiledAI Tools

INFERENCE PROFILER

Throughput, TTFT, bandwidth utilisation, and CPU offload analysis for any GPU + model combination.

Configuration

Parameters (B)

Quantization

GPU Count

GPU

Backend

Context (tokens)

Batch Size

Profile Results

Throughput

3t/s

TTFT

20.16s

Model size42.0GB

KV cache (4K ctx)0.0GB

Total VRAM needed42.5GB

CPU offload20.4GB

BW utilisation0%

Analysis

⚠ 20.4GB overflows to RAM — ~3 tok/s (CPU-limited)

✗ Too slow for interactive use