OpenAI unveiled new developer tools at its DevDay event held yesterday in San Francisco. The company announced the updates added to GPT-4o and the Chat Completions API, introducing features like a Realtime API, image fine-tuning, model distillation, and prompt caching tools.

Innovations in GPT-4o

Among the newly introduced features, users of GPT-4o who create website layouts will be able to provide the model with a collection of example designs. Similarly, those looking to extract data from documents scanned using GPT-4o can train the model on previously processed files. This training will allow users to reduce accuracy issues. Additionally, OpenAI noted that a fine-tuning database of about 100 images would be sufficient to enhance the performance of GPT-4o.

Details of the Realtime API

During the event, the company introduced the cloud service Realtime API. With this service, software teams can add multimodal processing capabilities to their applications. Developers will also be able to create AI applications that can understand voice commands and read responses aloud using the Realtime API. Until now, sending a voice command to an OpenAI model required multiple steps, including transcribing the voice, relaying the transcribed text to the model, and then converting the model’s text output into synthetic speech. The Realtime API allows for direct audio streaming to GPT-4o without these intermediate steps.

According to the information shared by the company, this service not only simplifies development but can also reduce model latency. This means that AI applications supported by the Realtime API will be able to respond to user instructions more quickly. Among the features of the Realtime API is the ability for the applications it supports to automatically perform tasks in external systems.

In the future, Realtime API will also encompass image and video processing. OpenAI plans to facilitate integration with workloads built using the Python and Node.js application development frameworks by making changes to its development kit.

New Features Added to the Chat Completions API

At the event, a multimodal capability similar to the Realtime API for processing voice input was also introduced to the existing Chat Completions API. According to OpenAI, this feature is aimed at voice processing use cases that do not require the low latency offered by the Realtime API.

Image Fine-Tuning

Additionally, OpenAI announced the launch of a feature called image fine-tuning. Image fine-tuning involves providing additional training data to a neural network to enhance the quality of its output. Developers will be able to supply specific image datasets to ChatGPT-4o using this new image fine-tuning feature, thus enabling the model to perform better on computer vision tasks. This fine-tuning can be especially beneficial for developers creating applications that process images.

Model Distillation and Prompt Caching

OpenAI also introduced features designed to reduce inference costs, including Model Distillation and Prompt Caching.

Point Of Sale Global News oai model distillation 534.webp

Let’s start with model distillation. Model distillation uses an AI technique known as knowledge distillation to help developers save resources. With this technique, developers can replace a large model with a smaller one that uses less hardware.
When given the same prompt, a larger neural network may produce a better response than a smaller one. However, through knowledge distillation, developers can obtain the higher-quality response of the larger model and transfer it to the smaller model. Developers can access the model distillation feature via an application programming interface. With this feature, developers sending prompts to one of the company’s leading models can transform the model’s responses into an AI training dataset. The resulting dataset can then be used to enhance the quality of a smaller neural network.

Point Of Sale Global News ekran resmi 2024 10 02 102443 620

Prompt Caching enables the company’s models to reuse user inputs in certain situations. As a result, the models do not have to repeat computations they have already completed once. With these updates, OpenAI is expecting a reduction of up to 50% in inference costs. The company also mentions that users will experience improved response times. Prompt caching is available in the latest versions of GPT-4o, GPT-4o mini, o1-preview, and o1-mini.


Source