Scaling AI Infrastructure: Navigating Risks in Distributed Systems

Wednesday, August 20, 2025
2:45 PM - 3:15 PM
AI Risk Summit Track 2 (Salon II)

About This Session

As organizations increasingly integrate AI into their operations, the scalability of AI infrastructure becomes paramount. However, scaling introduces a spectrum of risks, from data inconsistencies and model drift to system failures and security vulnerabilities. Drawing from my experience leading AI infrastructure projects at Fortune 50 companies and major cloud providers, this session will delve into the challenges and solutions associated with scaling AI systems.​

Key discussion points will include:

Designing resilient distributed systems that mitigate common failure points.
Implementing robust monitoring and observability to detect and address anomalies proactively.
Ensuring data integrity and consistency across diverse pipelines.
Balancing scalability with compliance, especially in regulated industries like healthcare.
Fostering cross-functional collaboration to align technical solutions with organizational risk management strategies.

Attendees will gain actionable insights into building scalable AI infrastructures that are not only efficient but also resilient against potential risks.

Speaker

Ashok Prakash

Ashok Prakash

Senior Principal Engineer - Oracle

Ashok Prakash is a senior principal engineer at Oracle Cloud, where he leads the architecture and scaling of high-performance GPU cloud systems powering AI workloads. With deep expertise in distributed systems and health AI, Ashok has built mission-critical infrastructure supporting the latest generation of AI accelerators—from A100s to GB200s. He has led large-scale engineering orgs and developed automated repair frameworks that significantly reduce cloud deployment time.

An AI researcher turned systems leader, Ashok has published papers on commonsense reasoning, NLP, and clinical AI, and holds multiple patents in cloud orchestration and automated triaging.