Nvidia updates ChatRTX with voice and video support; adds Gemma, ChatGLM3, CLIP models | Technical news

Nvidia has updated its ChatRTX chatbot with a slew of AI models and capabilities for working with speech and graphics. The chatbot runs large language models locally on the device. Users can request their personal documents and images using the chatbot.

ChatRTX works locally, on the device, using your files. (Image credit: Bing Image Creator).

Essentials

  • Users will need Nvidia graphics cards to run the application.
  • At least 100 GB of free disk space is required to use the app.
  • ChatRTX works locally through a browser window.

New Delhi: In February 2024, Nvidia introduced its ChatRTX chatbot, which allows users with Nvidia graphics to run large language models locally. Essentially, the app provided an experience similar to ChatGPT locally on the machine. ChatRTX is primarily a chatbot app for interacting via text, but allows users to include their own files in the searches.

ChatRTX uses retrieval-enhanced generation, Nvidia’s Tensor-RT-LLM software, and Nvidia RTX graphics to bring the capabilities of chatbots locally to Windows PCs and workstations. By default, ChatRTX uses Mistral, but the latest version adds support for Google’s Gemma which was developed from the same research and technology used to build the company’s Gemini models.

ChatRTX now also supports ChatGLM3, an open, bilingual model that can converse in Chinese and English, based on the common language model framework. There is also support for Meta’s Llama2. Users can select the AI ​​model they want to interact with from a drop-down menu and then provide a path to local files for reference. The supported document file types are .doc, .pdf and .txt. Users can use the chatbots to request their local files and data.

Support for images and voice conversations

Users can also interact with images thanks to support for OpenAI’s Contrastive Language-Image Pre-training, or CLIP. CLIP is a neural network that learns visual concepts through natural language supervision, through training and refinement. What that means in simple terms is that CLIP can identify the subjects in images. Users can add a folder of images to ChatRTX and then use CLIP to find specific images.

Users can interact with photos and images on their local devices using words, terms or phrases, without the need for complex metadata tags. The new update also allows users to chat with the bot and request their local details using voice commands. This is thanks to Whisper, an automatic speech recognition system that uses AI to process the questions via spoken language.