Introducing InferenceX: Open Source vLLM Benchmarking for Multi-GPU Inference
I built InferenceX, an open-source tool for benchmarking LLM inference across different GPU configurations. It measures throughput, latency, TTFT, and real power efficiency using nvidia-smi. Works with any HuggingFace model that vLLM supports.
read more →