The 'Real' jQuery Autocomplete for Korean Language.

The 'Real' jQuery Autocomplete for Korean Language.

default avatar
VonArtem Dotsenko
August 26, 2015
FFW Blog - illustration

Why 'real' and what’s wrong with jQuery autocomplete for Korean language? The problem is that jQuery autocomplete handles the Korean language in the same way as any other language. But, Korean has it specific while typing it in some input field. For example lets see how jQuery autocomplete works: you type a character – autocomplete finds items that starts with this character, only one button press will give you the results (Fig.1). This new approach will be shown in the base of Drupal CMS.

Why isn’t this a good approach for Korean language? The problem is that in order to input one Korean character (next hangul), you need to press two or three buttons and only after that autocomplete will show you results. But, as soon as you start typing next hangul – the previous results will disappear until you complete the set of characters (next jamo) for composing hangul .

For a better understanding visit https://www.branah.com/korean. As you can see, to type 건반 you need to press ㄱ ㅓ ㄴ ㅂ ㅏ and ㄴ. More information about how the Korean writing system works could be found at http://gernot-katzers-spice-pages.com/var/korean_hangul_unicode.html.

What is the solution? The solution is to help jQuery autocomplete by decomposing hangul to components and allowing the autocomplete script to do its work. Let’s look at some code as an example.

First we need to include the system ui.autocomplete library. For example it could be included using #attached in a renderable array of the form:

<?php
// Attach Drupal system ui.autocomplete library to form.
$form['#attached']['library'][] = array('system', 'ui.autocomplete');

Next create a Drupa.behaviors object with next methods in addition to default attach:

  • config – for storing lead, vowel, tail arrays of hangul components (jamo). sBase, lBase, vBase – where lBase and sBase are the Unicode range from AC00 (decimal 44032) to D7A3 (decimal 55203). And vBase equals to 588.
     
  • hangul2jamo – helper function to decompose hangul to jamo. Returns object with lead, vowel and tail (tail can be empty).
     
  • attachAutocomplete – main autocomplete function which takes selector of input field and list of values to search in, for autocomplete results, as arguments.

The algorithm is to check if the character that been inputed are not English, then to check if it can be decomposed by hangul2jamo function. If it can be decomposed then we pass this decomposed string to default jQuery autocomplete functionality which now can work with this string properly after first key press (Fig.3) and even after first hangul was entered .

Below is the code implementing described algorithm. Assume we have an input field with 'edit-product' id.

/**
* @file
*
* JavaScript actions for Korean autocomplete.
*/

(function($){
/**
  * Priscribing information items class.
  */
Drupal.behaviors.koreanAutocomplete = {
   config: {
     lead: ['ㄱ', 'ㄲ', 'ㄴ', 'ㄷ', 'ㄸ', 'ㄹ', 'ㅁ', 'ㅂ', 'ㅃ', 'ㅅ', 'ㅆ', 'ㅇ', 'ㅈ', 'ㅉ', 'ㅊ', 'ㅋ', 'ㅌ', 'ㅍ', 'ㅎ'],
     vowel: ['ㅏ', 'ㅐ', 'ㅑ', 'ㅒ', 'ㅓ', 'ㅔ', 'ㅕ', 'ㅖ', 'ㅗ', 'ㅘ', 'ㅙ', 'ㅚ', 'ㅛ', 'ㅜ', 'ㅝ', 'ㅞ', 'ㅟ', 'ㅠ', 'ㅡ', 'ㅢ', 'ㅣ'],
     tail: ['ㄱ', 'ㄲ', 'ㄳ', 'ㄴ', 'ㄵ', 'ㄶ', 'ㄷ', 'ㄹ', 'ㄺ', 'ㄻ', 'ㄼ', 'ㄽ', 'ㄾ', 'ㄿ', 'ㅀ', 'ㅁ', 'ㅂ', 'ㅄ', 'ㅅ', 'ㅆ', 'ㅇ', 'ㅈ', 'ㅊ', 'ㅋ', 'ㅌ', 'ㅍ', 'ㅎ'],
     sBase: 55203,
     lBase: 44032,
     vBase: 588
   },
   hangul2jamo: function(hangul) {
     var codepoint = hangul.charCodeAt(0),
         config = Drupal.behaviors.koreanAutocomplete.config;
     if (hangul == '' || codepoint < config.lBase || codepoint > config.sBase) {
       return '';
     }
     var start = codepoint - config.lBase,
         tail = parseInt(28 * (start / 28 - parseInt(start / 28)) + 0.001) - 1,
    vowel = parseInt((config.vBase * ((start - tail) / config.vBase - parseInt((start - tail) / config.vBase)) + 0.001) / 28 + 1) - 1,
         lead = parseInt(start / config.vBase);
     return {
       lead: config.lead[lead],
       vowel: config.vowel[vowel],
       tail: typeof(config.tail[tail]) == 'undefined' ? '' : config.tail[tail]
     };
   },
   attachAutocomplete: function(selector, items) {
     var inputElement = $(selector);
     if (inputElement.length && items) {
       inputElement.autocomplete({
         source: function(request, response) {
           var matches = $.map(
             items,
             function(tag) {
               // Use default logic for english items.
               if (request.term[0].match(/[a-zA-Z]/)) {
                 var tagUpper = tag.toUpperCase(),
                     termUpper = request.term.toUpperCase();
                 // If searched word is at the start of the item name.
                 if (tagUpper.indexOf(termUpper) === 0) {
                   return tag;
                 }
                 return;
               }
               // Korean specific autocomplete logic.
               var utag = tag.toUpperCase(),
                   urequest = request.term.toUpperCase(),
                   ktag = '',
                   position = urequest.length - 1;
               if (typeof(utag[position]) != 'undefined') {
                 ktag = Drupal.behaviors.koreanAutocomplete.hangul2jamo(utag[position]);
                 if (typeof(ktag.lead) != 'undefined') {
                   ktag = ktag.lead;
                 }
               }
               var cond = false;
               if (position > 0) {
                 cond = (urequest.indexOf(ktag) === position);
                 for (var i = 0; i < urequest.length; i++) {
                   if (i < position) {
                     cond = cond && utag.indexOf(urequest.charAt(i)) === i;
                   }
                 };
               }
               // Decompose tag.
               var tag_decomposed = '';
               for (var i = 0; i < tag.length; i++) {
                 var h = Drupal.behaviors.koreanAutocomplete.hangul2jamo(tag.charAt(i));
                 tag_decomposed += h.lead + h.vowel + h.tail;
               }
               // Decompose user request.
               var urequest_decomposed = '';
               for (var i = 0; i < urequest.length; i++) {
                 var h = Drupal.behaviors.koreanAutocomplete.hangul2jamo(urequest.charAt(i));
                 urequest_decomposed += h.lead + h.vowel + h.tail;
               }
               var decomposed = false;
               if (urequest_decomposed != 'NaN') {
                 decomposed = tag_decomposed.indexOf(urequest_decomposed) === 0;
               }
               if (utag.indexOf(urequest) === 0 || ktag.indexOf(urequest) === 0 || cond || decomposed) {
                 return tag;
               }
             }
           );
           response(matches);
         }
       });
     }
   },
   attach: function(context, settings) {
     var itemList = [
       'absolute',
       'absolite',
       'bobos',
       'babos',
       'citro',
       'carrot',
       'wassap',
       'wysiwyg',
       '에락시스',
       '에스트라머스틴 인산나트륨 수화물',
       '암로디핀 베실산염 - 노바스크',
       '암로디핀 베실산염/아토르바스타틴 칼슘삼수화물',
       '노바스크',
       '뉴론틴',
       '달테파린나트륨',
       '건반',
       '독시사이클린수화물'
     ];
     this.attachAutocomplete('#edit-product', itemList);
   }
}
})(jQuery);

And this code doesn't break default jQuery autocomplete. Autocomplete with english characters still works.

In conclusion. We made Korean autocomplete work in the same way as it works with any other language. No more note messages about special behavior on the Korean version of your site or even disabling autocomplete (or whatever action you took to make a workaround for this issue). All of your site visitors will be treated in the same way. Because, after all, isn't this a whole point?