Localization and language quality

People have been complaining recently about the decline of language quality (actually, they’ve been complaining for decades – or make that centuries!)  I have to admit that I sympathize: I’m from a generation that was taught to value good writing, and I still react with horror when I see obvious errors, like using “it’s” instead of “its”, or confusing ‘their’, ‘there’ and ‘they’re’.  (I’m even more horrified when I make mistakes myself, which happens more than I like to admit.)

But for my son’s generation? Not so much. Grammar, spelling and punctuation aren’t that important to them; what matters is whether the other person understands them, and vice versa. My son is already 25 (wow, time flies!), so there’s another generation coming up behind him that’s even less concerned about ‘good’ writing; in fact, this new generation is so accustomed to seeing bad writing that for the most part they don’t even realize there are errors.  This makes for a vicious circle: people grow up surrounded by bad writing, so they in turn write badly, which in turns exacerbates the problem. I’ve heard this referred to as the ‘crapification of language’.

Why is this happening?

Ease of publishing: in the old days, the cost of publishing content – typesetting it, grinding up trees and making paper, printing the content onto the paper, binding it, shipping it to a store and selling it – was immense. For this reason most published content was thoroughly edited and proofread, as there was no second chance. So if you read printed content like books, magazines and newspapers, you were generally exposed to correct grammar, spelling and punctuation. Since most of what people read was correctly written (even if not always well-written), people who read a lot generally learned to write well. But now anyone can create and publish content, with no editing or proofreading. The result is just what you’d expect.

Informal communications: email, texting, twitter – they all favor speed, and when people are in a hurry quality usually suffers.

Machine-generated content: this includes content that’s created by computers – for example, Machine Generated support content created by piecing together user traffic about problems – as well as Machine Translated content. Machine Generated content, and especially MT content is, as we localization people know, often of very poor quality.

What does this mean for Localization?

Being in the localization business myself, I want to tie this in to the effect on localization. In some ways this ‘crapification’ works against us: garbage in garbage out, after all, and if the source content is badly written then it’s harder for the translators to do a good job, be they humans or machines. But at the same time, this can work for us – especially when it comes to Machine Translation, where there are a couple of things that are making even raw MT more acceptable:

MT engine improvements: MT quality has steadily improved over the past 50 years (yes it’s been around at least that long!) Major improvements, like statistical MT and now neural MT, seem to occur every 10 years or so. Perfect human-quality MT is still ‘only 5 years out’ and will undoubtedly continue to be so for a long time, but quality is steadily improving.

User expectations: The good news for MT is that due to the crapification of language the expectations bar has been coming down, and people are much more willing to accept raw MT, warts and all. Despite the quality problems, more & more people are using web-based MT services like Google Translate, Bing Translator, etc., to read and write content in other languages.  As with texting above, they’re more concerned with content than with form: they’re OK with errors as long as they can understand the content or at least get the gist of it. This seems to be true even for countries that have traditionally had a high bar for language quality, like Japan and France. As shown in the chart below, we’ve already passed the point that raw MT is acceptable for some types of content. (Note that this chart is purely illustrative and is not based on hard data.)

Of course the bar remains high for things like legal documents, marketing content and of course your own personal homepage, but it’s getting lower for many other types of content, especially for things like support content (which many companies have been MTing for years), as well as for blogs and other informal content. In fact, the graph could be redrawn something like this:

Is there any hope for language quality?

As the quality of machine-generated and machine-translated content improves and as editing and proofing tools become better and more ubiquitous, the quality of all content will improve, until we approach the days of professionally edited and proofread books and magazines. As bad writing disappears and people grow accustomed to seeing well-written content, I think even unedited human language quality will start to curve back up again. (I’ve tried to capture this in the graphs above.)

So yes, I believe the crapification of language will slow and eventually reverse itself (hmm, unpleasant plumbing image there)! This doesn’t mean languages won’t continue to evolve, fortunately. That’s one of the things that make them so fascinating – and so challenging to translate.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s