Will your IT infrastructure cope with your AI demands?

BrandPost By John Cox

Oct 17, 20254 mins

Resource-constrained systems and networks can undermine enterprise large language models.

Large language models (LLMs) are evolving quickly. They bring powerful advances in language, vision, reasoning, and real-time interaction to artificial intelligence (AI) initiatives. However, they also bring massive, unexpected infrastructure demands that many organizations aren’t prepared to handle.

New pressures on IT infrastructure

Many enterprise data centers were not designed for the technical demands that characterize AI, generative AI, and their underlying LLMs, including:

High-density graphics processing unit (GPU) workloads
High-bandwidth networking
Massive parallel data flows

LLMs require 10x to 100x more compute capability than traditional machine learning (ML) models. Furthermore, LLM training and inferencing pose unique demands. The result is a clash between enterprise AI ambition and AI readiness.

“Training an LLM requires massive, bursty GPU capacity, high-speed interconnects, and distributed storage throughput in the terabytes per second range,” says Patrick Ward, senior director for services, Penguin Solutions. “By contrast, LLM inferencing is highly latency-sensitive, and it needs to scale elastically for unpredictable peaks.”

For those enterprises that are unprepared, these demands can lead to hidden costs, including network bottlenecks, elevated latency, and underutilized GPUs.

IT leaders who want to ensure their organizations can handle LLM workloads now and in the future should consider directing a multi-level AI readiness assessment with at least four actions.

1. Assess existing IT infrastructure.

“Plan your infrastructure for growth because static architecture will age fast,” says Ward.

Optimizing for AI means more than accounting for compute, network, storage, and cooling capacity. It should include a detailed examination of how these elements work with each other, within discrete systems, between clusters, and across networks. That means understanding, for example, GPU availability, interconnect speeds, and storage throughput.

2. Assess your workforce skillsets.

AI-related technologies are evolving quickly, yet organizations will need a few specific functional roles to keep pace, including:

Machine learning operations (MLOps) engineers
Data engineers
AI architects with distributed training experience

Your skills assessment should guide decisions about hiring, retraining, incentivizing, and creating new career paths.

3. Establish an AI governance and compliance strategy.

Poor AI governance can expose the enterprise to operational, legal, ethical, and financial risks. To mitigate these risks, IT leaders should:

Systematically track fast-changing AI regulations and laws
Embed compliance and accountability from the outset to avoid costly rework
Form a dedicated team to manage requirements such as provenance, audit trails, and explainability

4. Benchmark against industry best practices.

As AI adoption grows, proven best practices are taking shape. Benchmarking allows your organization to measure its AI operations and processes against industry leaders.

IT leaders should consider leveraging benchmarks to identify bottlenecks in compute memory, networking, or storage, establish performance baselines, and compare results against vendor specifications or other clusters.

They should also consider running pilot workloads—for example, processing smaller datasets on distributed GPUs to validate scaling efficiency and test workflow integrations. Doing so enables teams to address practical challenges such as software compatibility, container setup, and job orchestration.

Together, these steps help to ensure the chosen LLM can meet performance demands before committing to large rollouts.

The bottom line

Fast-changing LLMs bring not only powerful AI benefits but also equally powerful resource demands on enterprise IT infrastructure. IT leaders can prepare their organizations with a multi-level AI readiness assessment. The Penguin Solution Architecture team can assess your IT infrastructure’s AI readiness and get you on the path to success. Learn more here.