Intento

Blog/GenAI

ChatGPT for Translation: Surpassing GPT-3

Konstantin Savenkov

CEO and Сo-founder of Intento

We recently examined GPT-3’s (davinci-003 model) translation abilities and found that traditional MT models still hold their place in AI. However, with the commercial release of ChatGPT (gpt-3.5-turbo) via API, now at only 10% of GPT-3’s cost, it’s time to reevaluate our conclusions.

What’s new?

People often mix up GPT-3 (commercial, API-accessible) with pre-API ChatGPT (free, no data confidentiality), similar to the confusion between Google Translate API and the free web-based Google Web Translator in 2016. However, GPT-3 and ChatGPT use distinct models.

The top GPT-3 model, davinci-003, launched in November 2022, while the ChatGPT model, gpt-3.5-turbo, debuted on March 1st, 2023.

Limited information exists on their differences, but gpt-3.5-turbo is about 10 times less expensive than GPT-3 (davinci-003) and more lenient with prompt engineering.

With the launch of commercial, API-enabled ChatGPT, significant improvements in data protection and retention have emerged, which are essential for the enterprise solutions we develop at Intento:

Data submitted through the API is no longer used for service improvements (including model training) unless the organization opts in

Implementing a default 30-day data retention policy for API users, with options for stricter retention depending on user needs.

Removing our pre-launch review (unlocked by improving our automated monitoring)

Improving developer documentation

Simplifying our Terms of Service and Usage Policies, including terms around data ownership: users own the input and output of the models.

General comparisons between the two models are inconclusive, as each has its strengths and weaknesses. However, gpt-3.5-turbo reportedly excels in zero-shot training (as a stock model), which is precisely what we need for basic Machine Translation!

Wait, 10 times cheaper?

Indeed, ChatGPT (gpt-3.5-turbo) costs $0.002 per 1K tokens, while GPT-3 is priced at $0.02 per 1K tokens.

But what about the cost of a million characters? Let’s see how GPT-4 answers that (oops, a spoiler!).

Keep in mind, you pay for both the prompt and output, and prompts add characters to the input text. A realistic cost is roughly $1.5 per million characters, making it 13 times cheaper than standard Google Translate.

Can we expect high-quality translations at this price? Let’s find out!

Evaluation approach

We used the same evaluation methodology as in our original GPT-3 Translation Capabilities post.

There are various ChatGPT API prompt engineering approaches, such as semi-structured prompts.

We assessed multiple approaches and prompts for in-domain translation. While we didn’t find significant score differences among prompts, we did see varying numbers of risky and outlier translations. Thus, prompt engineering is crucial.

General domain translation

In the General domain, ChatGPT reached the top tier for English to Spanish, matching leaders like Google, Amazon, DeepL, Yandex, and Microsoft. This is a significant achievement, as GPT-3 only ranked in the second tier.

COMET scores for ChatGPT English to Spanish translation, general domain

 

For English to German, DeepL remains unmatched, but ChatGPT has advanced to the second tier, alongside the best alternative MT engines.

COMET scores for ChatGPT English to German translation, general domain

In-domain translation

In Legal and Healthcare, English to German, ChatGPT’s results have improved significantly but still trail top-ranking engines. This suggests that MT-specific model providers focus on maintaining more balanced and representative training data across various domains and content types.

COMET scores for ChatGPT English to German translation, healthcare domain

 

COMET scores for ChatGPT English to German translation, legal domain

Business-critical errors and weak spots

Overall, ChatGPT’s output shares similar issues with other (good) MT engines. However, we notice that the number of business-critical translation errors varies significantly depending on the prompt (see below).

English to Spanish, general domain

ChatGPT weak spots for English to Spanish translation (General domain)

 

Common issues in the General domain:

English to German, general domain

ChatGPT weak spots for English to German translation (General domain)

 

Common issues in the General domain:

English to German, Healthcare

ChatGPT weak spots for English to German translation (Healthcare domain)

 

Common issues in the Healthcare domain:

English to German, Legal

ChatGPT weak spots for English to German translation (Legal domain)

 

Common issues in the Legal domain:

So, how to use it for translation?

Like any translation technology on the Intento platform, ChatGPT translation will be accessible through all Intento connectors, including 15 TMS systems and various Customer Service platforms like Salesforce, ServiceNow, and Zendesk.

Our ChatGPT connector is currently undergoing thorough quality assurance. To stay informed, register on the Intento website and subscribe to our newsletter.

Conclusions

The technological progress in just under 3 months is astounding. ChatGPT performs on par with the best stock MT engines, at less than 10% of their cost.

COMET performance of GPT-3 and ChatGPT translation compared to commercial MT engines.

 

Additionally, it offers morphological glossary application and few-shot learning capabilities, thanks to the versatile interface of this general model.

We’ll delve into the customizability of GPT-powered translation later, along with an evaluation of GPT-4, which launched today, March 14th.

Stay tuned for our upcoming State of the Machine Translation 2023 report later this year, where we’ll evaluate all GPT-based translation options. It’s going to be exciting!

Read more

SHARE THIS ARTICLE
Continue reading the article after registration
Already a member? Sign In

We know how to make your business multilingual and productive. Let's talk.