Data Protection Private Cloud Semiconductors

Explainer: AI's future is being split between device and cloud

Wed, 10th Jun 2026

AI companies and hardware makers are converging on a more distributed future for artificial intelligence, with more inference moving from remote data centres to local devices, on-premises systems and private cloud services.

The shift was clear at Computex in Taipei. It also appeared in a different form at Apple's WWDC. The message from both events was not that cloud AI is disappearing. It was that more routine, sensitive and time-critical AI work may soon happen closer to the user.

That marks a change from the first phase of generative AI adoption. Most users have so far interacted with large models through remote chatbot services. The model sits in a data centre. The user sends a prompt. The provider pays for the infrastructure and recovers the cost through subscriptions, usage plans or enterprise contracts.

The next phase looks more distributed. PCs, Macs, workstations, NAS devices and private appliances are being positioned as places where AI agents can run continuously. Cloud models will still matter. The question is which tasks should stay local, and which ones need the cloud.

Local shift

At Computex, Intel and Perplexity gave one of the clearest examples of the hybrid inference argument. Their message was that some agentic AI work can run locally, while more demanding tasks can still move to cloud models.

That split matters because agents are not like occasional chatbot queries. They are designed to sit inside workflows. They may search files, summarise documents, schedule tasks, read email or act across software. If every action requires a paid cloud call, costs can rise quickly. If every action sends context to an external service, privacy and compliance concerns also grow.

Perplexity's position reflects a wider industry question. The AI assistant may not remain a standalone app. It may become a layer across a user's machine. In that model, the device needs to understand local context. It also needs to decide when external intelligence is required.

Intel has a clear commercial reason to support this shift. A world of distributed inference gives the PC a stronger role in AI. It also gives CPUs, NPUs and local memory a more visible place in the stack. That does not make the argument wrong. It means the technical claims should be judged alongside the commercial incentives.

Apple turn

Apple's WWDC message changed the framing. Computex vendors largely described local AI through hardware. Apple described it through the operating system.

Apple has built its AI strategy around on-device processing, private cloud capacity and tighter integration into apps. It is not asking users to manage models. It is trying to make AI appear inside ordinary tasks, while the system decides where the processing happens.

That makes Apple important to the hybrid inference debate. It is not only selling more powerful devices. It is making local and private cloud inference part of the platform experience. Developers can build features that use Apple's on-device models. More complex tasks can move through Apple's cloud architecture.

This approach reduces the need for a separate chatbot window in some cases. A user writing, searching, editing or organising information may not think of the interaction as "using AI". It may simply be part of the app or operating system.

That is the more important point. Hybrid inference is not only a chip story. It is also a user-experience story. The company that controls the interface may control the routing decision.

Computex proof

Computex showed why hardware companies are pressing this argument.

NVIDIA used the event to extend its case for local AI through RTX Spark systems and deskside AI machines such as DGX Spark. These systems are aimed at developers, creators and technical teams that want to run models locally. The case rests on privacy, latency, memory and reduced dependence on cloud tokens.

HP added its own line-up of AI PCs and compact systems tied to local processing and hybrid workflows. Its Computex announcements positioned PCs as machines for local agents, not just endpoints for cloud services.

AMD is also part of the same movement. Its Ryzen AI and Ryzen AI Max platforms are aimed at machines that can handle larger local workloads. The broader message is that AI will not remain confined to remote infrastructure.

Storage and NAS vendors added another layer. Synology used Computex to position its systems around private AI, governance and on-premises collaboration. ChatPlus and related tools fit into a story where enterprise data remains inside a customer-controlled environment. QNAP made a similar case through AI NAS systems designed for local intelligence and edge workloads.

Innodisk, ASUS and GIGABYTE presented variations of the same theme. Some focused on industrial edge AI. Others focused on creator PCs, mini PCs or local development machines. The common claim was that inference will happen in more places than the public cloud.

Cost pressure

The economic case is simple. Cloud inference is useful, but it is not free.

As AI becomes embedded in workflows, usage patterns change. A chatbot that answers a few questions a day creates one cost profile. An agent that monitors tasks, reads documents, runs searches and generates updates throughout the day creates another.

For businesses, this raises hard questions. Who pays for each token? Which tasks justify a frontier model? Which tasks can be handled by a smaller local model? How should sensitive data be filtered before it leaves the device or office?

Local inference does not remove the cost. It shifts the cost into hardware, memory, energy and management. A company may reduce some cloud bills, but it still has to buy and maintain machines. It also needs software that can decide which model should handle each task.

That is why hybrid inference is more plausible than a purely local future. Local models are useful for repeatable, private or low-latency work. Cloud models remain useful for complex reasoning, broader retrieval and tasks that require the strongest available model.

Memory limits

The biggest constraint is memory.

Running useful models locally is not only about processor speed. It also depends on how much memory a machine has, and how quickly that memory can be accessed. This explains the renewed attention on unified memory, high-memory desktops, AI workstations and systems that use storage to extend effective memory.

The Mac mini has become part of this conversation because compact, efficient machines can act as always-on local AI hosts. Developers and small teams can use them for local models, automation and private services. Similar logic applies to mini PCs, NAS boxes and deskside AI systems.

But the limits are real. Not every device can run meaningful AI workloads. Older phones, low-memory laptops and entry-level desktops will not handle the same tasks as newer machines with larger memory pools. The local AI future will probably arrive unevenly. Developers, creators, technical teams and higher-end consumers will see it first.

Control layer

The most important part of hybrid inference may not be the chip. It may be the control layer.

A hybrid AI system needs to decide where each request goes. It must identify sensitive data. It must know whether a local model is good enough. It must decide when cloud processing is worth the cost. It must also enforce enterprise rules.

That routing layer is where much of the competition will sit. Intel and Perplexity are talking about orchestration. NVIDIA is building software around local agents and policy. Apple is embedding routing into its platform. Synology and QNAP are framing it through data governance and private infrastructure.

The winner may not be the company with the fastest chip alone. It may be the company that makes routing reliable, secure and invisible to the user.

Vendor motives

Hardware companies are not neutral observers.

PC makers need a reason for businesses and consumers to upgrade. Chipmakers need markets beyond training large models in data centres. NAS and storage vendors want to show that their products remain relevant as enterprise data becomes fuel for AI systems.

Hybrid inference helps all of them. It gives the PC a new role. It gives memory and storage a more strategic place. It gives on-premises infrastructure a reason to exist in an era dominated by cloud AI.

That makes some Computex language easy to discount. Many claims remain light on pricing, benchmarks, availability and real-world deployment evidence. Some announcements are closer to roadmaps than products. Some systems will be expensive. Some local models will be good enough for narrow tasks, but not for more demanding work.

The sceptical reading is that hardware vendors are trying to prove they still matter. That reading is partly correct.

The fuller answer is that they are trying to prove relevance because the architecture really is changing. Cloud-only AI has problems that become more obvious as usage rises. It can be costly. It can be slow for some tasks. It can be difficult to govern. It can also be uncomfortable when sensitive business context is involved.

Local-only AI has its own problems. It is constrained by memory, model quality and device capability. It cannot replace frontier models in every task.

Hybrid inference is the compromise.

Next phase

The AI market is moving from a model race to a systems race.

The question is no longer only which company has the strongest model. It is also where that model runs, what data it can see, who controls the routing and how the cost is managed.

Computex showed the hardware industry preparing for that shift. Apple's WWDC showed the platform version of the same idea. Together, they suggest that AI inference will become less centralised over time.

That does not mean every user will run large models at home. It does not mean cloud AI services are under immediate threat. It does mean the cloud will have to share more work with devices, private servers and office infrastructure.

For consumers, the change may be subtle. AI may become less like a separate destination and more like a feature inside the tools they already use.

For businesses, the change could be more direct. The next AI decision may not be which chatbot to buy. It may be how to design an inference architecture that balances cost, privacy, performance and control.

That is why hybrid inference deserves attention beyond the show floor. It is both a marketing theme and a practical response to the limits of cloud-only AI. Hardware companies are trying to prove their relevance. This time, they have a credible reason.

ChatGPT

Key takeaways Explain why it matters Create action plan Future watch

Claude

Key takeaways Explain why it matters Create action plan Future watch

Perplexity

Key takeaways Explain why it matters Create action plan Future watch

Grok

Key takeaways Explain why it matters Create action plan Future watch

Share Share

Add us as a preferred source on Google