NVIDIA GTC Recap
Lots of new information to assimilate, this first-of-many post focuses on private agency related thoughts including putting the puzzle together for Private Agency for My House.
Soumith Chintala (co-creator of PyTorch) provided insights into the evolution of local inference and distributed training, which are foundational for home-based ML.
Ingredients to the bake Private Agency in the home?
Ignorance is dangerous, so let's take a look at what are the known-knowns, known-unknowns, and unknown-unknowns.
Inference
Generalized Local-Language including Specialized Language Models (SLMs)
- Likely a mixture of experts (MoE) working together in my house—at its most extreme form, it's one-per tool (though this may be an over-optimization).
Training
Federated Learning Participation for Local Retraining of Data
- Where is that data stored and in what format(s)? Parquet, Vector DB, or raw JSON?
Verification
How do we validate data is not being leaked via imported models? Exploring Zero-Knowledge Proofs for model weights.
Data Tagging by Sensitivity:
Implementing metadata layers that classify data (e.g., Public, Internal, Confidential) to dictate which models can process it.
Interoperability
Embeddings are model specific, what does a transformation space between embeddings look like? Can an open standard be created (or base embedding features) to allow transformation between embeddings?
Open major questions
- In addition to RL/SFT/LoRA/FP Quantization for inference, does anyone see a future without private retraining of models on a regular basis?
- On Hardware:
- What are the chances of data mixing in a RDMA GPU Mesh? What does a GPU Enclave look like?
- Private Enclaves exist for Motherboard HBM, Disk and CPU, and NVIDIA H100 Confidential Computing.
- Can the performance impact of segregation at scale become cheap enough to offset the need for local high-end hardware?
- Adding factual knowledge to a LLM (and suppressing old knowledge) at scale?
- How can we move embeddings from one model to another model at scale?
- If the Data Center (née AI Factory) is moving to 600kVA Racks, and 6 Mega-Watt hubs, what does the edge look like?
Testing Private Agency
Testing Private Agency involves benchmarking local inference speed against cloud-based alternatives while verifying zero-leakage through network monitoring and traffic analysis.
Detailed Links Training
Hints:
- https://developer.nvidia.com/gpudirect
- https://github.com/facebookincubator/gloo
- https://github.com/horovod/horovod
- https://security.apple.com/blog/private-cloud-compute/
- https://www.microsoft.com/en-us/research/blog/secure-training-of-machine-learning-models-on-azure/
- NVIDIA H100 Confidential Computing
