LLM fine-tuning & pre-training
Full-parameter fine-tuning of 70B-class models, continued pre-training, RLHF, and large-scale distributed runs. FP8 cuts wall-clock roughly in half versus BF16 on A100, and NVLink keeps gradient sync from becoming the bottleneck.
8×H100 SXM · A100 80GB for cost-sensitive runs



