Find a Language Quickly from Text

You can find a video of this lesson on iOS Development Tips Weekly from LinkedIn Learning. 

There’s a lot you can do with language easily in iOS and Watch OS. The Apple ecosystem has a Natural Language Processing (NLP ) system built in and easy to use. Let’s look at language recognition.
Download the starter file. It’s a playground with a few languages set up for you in a dictionary I used Google translate on. I’m trusting Google translate here and I know at least one of the non-latin script languages messed up in copying, which I deleted, so I’m not sure the others did, but will give us what we need.
The class we’ll use is the NSLinguisticTagger class. It has a class method to find the dominant language in text. Add this to your playground

var language = NSLinguisticTagger.dominantLanguage(for: str[lang])

Run and you get en for English, the standard name for the language.
You can try a few more languages too, even in other scripts. Try Spanish(es) or Hindi(hi). Now Chinese (zh) is interesting. There’s a few different scripts and dialects. Chinese gives you a qualifier telling you more about the language.
That’s fun, and a great way to start automatic localization. But there’s more you can do with tagging. I’ll give you one more example: lexical tagging. This finds parts of speech, paragraph and sentence structure for you. You’ll need aninstance of lexical tagger to do this

let tagger = NSLinguisticTagger(tagSchemes:[.nameTypeOrLexicalClass], options: 0)

There’s several tag schemes you can use. Two common ones are Name type and lexical class. There’s also a combination one which I’ll use.

let tagger = NSLinguisticTagger(tagSchemes:[.nameTypeOrLexicalClass], options: 0)

You assign your string to the tagger

tagger.string = str[lang]!

Tagger has a method enumerateTags which find all that tags within a range in the string. To check the entire string, I’ll need a value for the full range.

let fullRange = NSRange(location: 0, length: (str[lang]?.utf16.count)!)

I’ll use the enumerateTags method, which has several parameters. The first is range, which I’ll use the fullRange.

tagger.enumerateTags(in: fullRange,

Next is the unit I’m going to break this paragraph into, the full document, Paragraphs, sentences or words. I’ll use words.

unit: .word,

Next is the scheme, which will match one of the schemes for the tagger.

scheme: .nameTypeOrLexicalClass,

You can set options, and I’ll remove whitespace and punctuation from my scan.

options: [.omitPunctuation,.omitWhitespace])

THis method works a lot like a for loop, looping through all the units and for each unit running a closure. I’ll set up the closure with the tag, the range in the string of the word, and a pointer to a boolean value.

{ (tag, range, stop) in

THe tag for my setup will check parts of speech. I’ll check nouns.

if tag == .noun{
let word = (tagger.string! as NSString).substring(with:range)
print(word)
}

Run this with English as the Language, and you’ll get a set of nouns. Change to verbs, and you’ll get verbs. Change the language to French, and you’ll get French verbs. Try Italian. Change the tag to personal name. Change to Hindi, and you’ll see nothing but the language. Not all languages are available, and I’ve not seen a list of what languages work with tags and which don’t. For the languages that do work, there’s a lot more you can tag and earn about a sentence. Take a look at the documentation and the WWDC 2017 video for more.
<h1>The Whole Code</h1>
Here's the completed <span id="mce_SELREST_start" style="overflow:hidden;line-height:0;">&#65279;</span>playground code for this lesson. You can <a href="http://bit.ly/NLPTaggingEnd">download it from GitHub</a>

//: Playground - noun: a place where people can play
import UIKit

let str:[String:String] = [
    "English":"Where is the nearest Pizza Restaurant? Can I get a Pizza Margherita there? Steve loves pizza Margherita.",
    "Chinese":"最近的披萨餐厅在哪里?我可以在那里得到一份玛格丽塔披萨吗?史蒂夫喜欢披萨玛格丽塔。",
    "Spanish":"¿Dónde está el Pizza Restaurant más cercano? ¿Puedo conseguir una Pizza Margherita allí? Steve adora la pizza Margherita.",
    "French":"Où est le restaurant Pizza le plus proche? Puis-je avoir une Pizza Margherita là-bas? Steve adore la pizza Margherita.",
    "Italian":"Dov'è il ristorante pizzeria più vicino? Posso avere una pizza Margherita lì? Steve ama la pizza Margherita.",
    "Hawaiian":"ʻAuhea kahi Pizza Mea kokoke loa? Hiki iaʻu ke loaʻa i kahi Pizza Margherita ma laila? Paiʻo Steve i ka pizzaʻo Margherita.",
    "Hindi":"निकटतम पिज्जा रेस्तरां कहां है? क्या मुझे पिज्जा मार्गरिता मिल सकती है? स्टीव पिज्जा Margherita प्यार करता है।",
    "Japanese":"一番近いピザレストランはどこですか?そこにピザマルゲリータを手に入れることはできますか?スティーブはマルゲリータのピザが好きです。"
]
let lang = "Chinese"
var language = NSLinguisticTagger.dominantLanguage(for: str[lang]!)
let tagger = NSLinguisticTagger(tagSchemes: [.nameTypeOrLexicalClass], options: 0)
tagger.string = str[lang]!

let fullRange = NSRange(location: 0, length: (str[lang]?.utf16.count)!)
tagger.enumerateTags(in: fullRange, unit: .word, scheme: .nameTypeOrLexicalClass, options: [.omitPunctuation,.omitWhitespace]) { (tag, range, stop) in
    if tag == .noun{
        let word = (tagger.string! as NSString).substring(with: range)
        print(word)
    }
}

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.