Supermicro is a long-time partner of Rakuten Cloud, collaborating on the development of complete edge server solutions that integrate Rakuten Cloud-Native Software with Supermicro hardware. With edge AI emerging as a hot topic in edge deployments, we invited Yaming Wang, Head of 5G and Edge solutions at Supermicro, to our booth at Google Cloud Next ’26 to discuss popular trends and use cases.
AI has evolved into unique use cases in a variety of vertical markets, including healthcare, financial services and e-commerce, among others. Edge AI also takes different forms including generative AI, agentic AI, high-performance computing and computer vision.
In industrial applications like manufacturing plants, physical AI enables online, automated inspections, security checks and other functions. Because these applications run at the edge, they require their own servers, optimized for their deployment environment. This ranges from compact servers with entry-level GPUs to cases where server-class CPUs offer enough compute performance.
In addition to data inferencing, a key function of industrial AI applications is collecting data for model training. Processing this data requires more compute and storage than an inference-optimized edge AI system has. Because of this, data is processed in data foundries. A data foundry can either be built in an enterprise data center or a cloud server. The data foundry must have sufficient compute power for data training, making it a good fit for a Supermicro server with an entry-level NVIDIA H100 or H200 GPU. At the high-end of these applications, the server could require an NVIDIA Blackwell RTX Pro GPU.
And this processing load can grow quickly. “Today, we see a lot of our customers start with H100 and H200 GPUs, then move on to Blackwell,” Wang said.
Data foundry is one of many use cases in which data is shared between the edge and an enterprise AI data center. What these applications have in common is that the enterprise AI data center must handle heavier compute workloads and then send updated models back to the edge for inference
This requires higher-density CPUs and GPUs, that are connected with a high-speed network. The network must be engineered for low latency, because if the AI models support real-time applications, the edge server’s response time must be very low.
Wang: “[This application] will require a lot of real-time processing, so low latency is important in edge settings.”
Whereas edge servers can be standalone, enterprise AI servers are deployed in clusters and support multi-tenant environments. The right networking enables servers to collaborate and process the training data.
Wang took this opportunity to present a real edge AI customer use case, featuring the deployment of a digital concierge to provide face-to-face customer service. The concierge was a large screen with a virtual avatar serving as a concierge or front desk staff to welcome a guest.
The guest speaks their request to begin the interaction and asks for information. The prompt is sent to the system, where a server with an entry-level GPU processes the incoming audio and performs video analysis.
Another server with storage then retrieves existing information to answer the guest’s request. The system searches the vector database to retrieve the information, then triggers an LLM to compile a response to the guest, incorporating the information retrieved from the database.
That response then returns to the avatar controller and is presented to the user. Overall, the concierge requires two servers for front-end avatar processing and video compilation, and a third server running the back-end retrieval system, connected via switches. At the front, there's also a fanless appliance that supports the front camera and the front screen.
“The benefit of this setup is real-time text generation with low latency, along with a relatively small storage footprint for the rack. It can also be deployed on premises. We have shown this with a partner that already has live deployments in retail stores. Some retail stores deploy a human-sized avatar in front of the store, and once you go in, you can ask the avatar questions, and it responds in real time,” Wang said.
To hear Wang’s full presentation, watch his video here: https://www.youtube.com/watch?v=gvHpnimxgFY