NVIDIA launches new NIM microservices for Asia-Pacific LLMs

Wed, 28th Aug 2024

NVIDIA has launched four new NIM microservices tailored for large language models (LLMs) based on local language data from Japan and Taiwan. The new services are designed to help developers create and deploy efficient generative AI applications that resonate with regional languages and cultures.

The microservices support a variety of popular community models customised to fit regional needs, enhancing user interactions through accurate understanding and better responses reflecting local language and cultural subtleties.

Generative AI software revenue in the Asia-Pacific region is expected to grow significantly, hitting USD $48 billion by 2030, up from USD $5 billion this year, according to ABI Research.

Llama-3-Swallow-70B and Llama-3-Taiwan-70B are regional language models trained on Japanese and Mandarin data respectively. These models provide a deeper understanding of local laws, regulations, and customs.

The RakutenAI 7B family of models, based on Mistral-7B, were trained on English and Japanese datasets and are available as two different NIM microservices for Chat and Instruct. Rakuten's foundation and instruct models have achieved top scores amongst open Japanese large language models, ranking highest on the LM Evaluation Harness benchmark conducted from January to March 2024.

Training LLMs on regional languages enhances output efficacy, ensuring more accurate and nuanced communication by better reflecting cultural and linguistic subtleties. These models deliver leading performance in Japanese and Mandarin language understanding, regional legal tasks, question-answering, language translation, and summarisation compared to base LLMs like Llama 3.

Many nations, including Singapore, the United Arab Emirates, South Korea, Sweden, France, Italy, and India, are investing in sovereign AI infrastructure. The new NIM microservices enable businesses, government agencies, and universities to host native LLMs within their infrastructure, facilitating the development of advanced copilots, chatbots, and AI assistants.

Developers can quickly deploy these sovereign AI models, packaged as NIM microservices, achieving enhanced performance. The microservices, accessible via NVIDIA AI Enterprise, are optimised for inference using the NVIDIA TensorRT-LLM open-source library, offering up to five times higher throughput. This reduces operational costs and enhances user experiences by mitigating latency issues. The NIM microservices are now available as hosted APIs.

Professor Rio Yokota from the Global Scientific Information and Computing Center at the Tokyo Institute of Technology commented on the importance of developing sovereign AI models that adhere to cultural norms. "LLMs are not mechanical tools that provide the same benefit for everyone. They are intellectual tools that interact with human culture and creativity. The influence is mutual where the models are affected by the data we train on, and our culture and data will be influenced by LLMs." The availability of Llama-3-Swallow as an NVIDIA NIM microservice allows developers to deploy the model for a variety of Japanese applications.

Preferred Networks, a Japanese AI company, uses the Llama-3-Swallow-70B model to develop a healthcare-specific model trained on a unique corpus of Japanese medical data. Chang Gung Memorial Hospital (CGMH) in Taiwan is applying the Llama-3-Taiwan-70B model to improve the efficiency of its medical staff by delivering more nuanced medical language. Dr. Changfu Kuo, Director of the Center for Artificial Intelligence in Medicine at CGMH, remarked, "By providing instant, context-appropriate guidance, AI applications built with local-language LLMs streamline workflows and serve as a continuous learning tool to support staff development and improve the quality of patient care."

Companies like Pegatron, a Taiwanese electronics manufacturer, are adopting the Llama 3-Taiwan-70B NIM microservice for various applications. This microservice is also utilised by Chang Chun Group, Unimicron, TechOrange, LegalSign.ai, and APMIC.

Enterprises looking to fine-tune regional AI models for their specific business needs can use NVIDIA AI Foundry. This platform offers popular foundation models and tools for fine-tuning, including NVIDIA NeMo and NVIDIA DGX Cloud. Developers using NVIDIA AI Foundry also have access to the NVIDIA AI Enterprise software platform for secure, stable, and supported production deployments.

Share on: