[Keymap] Redo the accent implementation in melody96:zunger. (#11000)

The previous implementation generated accents in NFKD -- e.g., i followed by fn+e would generate í, which is actually an ordinary i followed by U+0301 COMBINING ACUTE ACCENT. Unfortunately, it turns out that a bunch of websites and apps (especially European ones written in languages that use these a lot) were very poorly written, and will misparse and/or crash if presented with Unicode NFKD. They require and expect NFKC, with characters like í (U+00ED LATIN SMALL I WITH ACUTE) that look visually identical -- and are in fact normalization-equivalent -- but have to be encoded differently. The new accent implementation handles this in a very flexible way. Many new comments added as well, as it's also clear that this is going to need a bit more expansion before it becomes a true polyglot keymap. Co-authored-by: Yonatan Zunger <zunger@desiderata.lan>
author: yonatanzunger <30514250+yonatanzunger@users.noreply.github.com> 2021-01-11 01:21:44 -0800
committer: GitHub <noreply@github.com> 2021-01-11 01:21:44 -0800
commit: 554b937d21f8c50515c22498f4f46df0b3ae6569 (patch)
tree: ed202d5c70dc0c0d538e0bb4e9c2f0fd3ec3804c
parent: b113888ec55e456ffcff2d6b04ad29309d01b325 (diff)
download: qmk_firmware-554b937d21f8c50515c22498f4f46df0b3ae6569.tar.gz
qmk_firmware-554b937d21f8c50515c22498f4f46df0b3ae6569.zip
1 files changed, 245 insertions, 23 deletions
diff --git a/keyboards/melody96/keymaps/zunger/keymap.c b/keyboards/melody96/keymaps/zunger/keymap.c
index d396de683..d0d2698b7 100644
--- a/keyboards/melody96/keymaps/zunger/keymap.c
+++ b/keyboards/melody96/keymaps/zunger/keymap.c
@@ -14,6 +14,83 @@
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
 */
 #include QMK_KEYBOARD_H
+#include <assert.h>
+// This keymap is designed to make it easy to type in a wide variety of languages, as well as
+// generate mathematical symbols (à la Space Cadet).
+//
+// LAYER MAGIC (aka, typing in many alphabets)
+//   This keyboard has three "base" layers: QWERTY, GREEK, and CADET. The GREEK and CADET layers
+// are actually full of Unicode points, and so which point they generate depends on things like
+// whether the shift key is down. To handle this, each of those layers is actually *two* layers, one
+// with and one without shift. In our main loop, we manage modifier state detection, as well as
+// layer switch detection, and pick the right layer on the fly.
+//   Layers are selected with a combination of three keys. The "Greek" and "Cadet" keys act like
+// modifiers: When held down, they transiently select the indicated base layer. The "Layer Lock" key
+// locks the value of the base layer at whatever is currently held; so e.g., if you hold Greek +
+// Layer Lock, you'll stay in Greek mode until you hit Layer Lock again without any of the mods
+// held.
+//   TODO: This system of layer selection is nice for math, but it's not very nice for actually
+// typing in multiple languages. It seems like a better plan will be to reserve one key for each
+// base layer -- maybe fn + F(n) -- which can either be held as a modifier or tapped to switch
+// layers. That will open up adding some more languages, like Yiddish, but to do this effectively
+// we'll need to find a good UI with which to show the currently selected layer. Need to check what
+// the melody96 has in the way of outputs (LEDs, sound, etc).
+//
+// ACCENT MAGIC (aka, typing conveniently in Romance languages)
+//   We want to support easy typing of diacritical marks. We can't rely on the host OS for this,
+// because (e.g.) on MacOS, to make any of the other stuff work, we need to be using the Unicode
+// input method at the OS level, which breaks all the normal accent stuff on that end. So we do it
+// ourselves. Accents can actually be invoked in two different ways: one fast and very compatible,
+// one very versatile but with occasional compatibility problems.
+//
+//   THE MAIN WAY: You can hit one of the "accent request" key patterns immediately *before* typing
+//   a letter to be accented. It will emit the corresponding accented Unicode. For example, you can
+//   hit fn-e to request an acute accent, followed by i, and it will output í, U+00ED LATIN SMALL
+//   LETTER I WITH ACUTE. These "combined characters" are in Unicode normal form C (NFKC), which is
+//   important because many European websites and apps, in particular, tend to behave very badly
+//   (misunderstanding and/or crashing) when presented with characters in other forms! The catch is
+//   that this only works for the various combinations of letters and accents found in the Latin-1
+//   supplement block of Unicode -- basically, things you need for Western European languages.
+//
+//   (NB: If you make an accent request followed by a letter which can't take the corresponding
+//   accent, it will output the uncombined form of the accent followed by whatever you typed; so
+//   e.g., if you hit fn-e followed by f, it will output ´f, U+00B4 ACUTE ACCENT followed by an
+//   ordinary f. This is very similar to the default behavior of MacOS.)
+//
+//   THE FLEXIBLE WAY: If you hit the accent request with a shift -- e.g., fn-shift-e -- it will
+//   instead immediately output the corresponding *combining* Unicode accent mark, which will modify
+//   the *previous* character you typed. For example, if you type i followed by fn-shift-e, it will
+//   generate í. But don't be fooled by visual similarity: unlike the previous example, this one is
+//   an ordinary i followed by U+0301 COMBINING ACUTE ACCENT. It's actually *two symbols*, and this
+//   is Unicode normal form D (NFKD). Unlike NFKC, there are NFKD representations of far more
+//   combinations of letters and accents, and it's easy to add more of these if you need. (The NFKC
+//   representation of such combinations is identical to their NFKD representation)
+//
+//   Programs that try to compare Unicode strings *should* first normalize them by converting them
+//   all into one normal form or another, and there are functions in every programming language to
+//   do this -- e.g., JavaScript's string.normalize() -- but lots of programmers fail to understand
+//   this, and so write code that massively freaks out when it encounters the wrong form.
+//
+// The current accent request codes are modeled on the ones in MacOS.
+//
+//    fn+`    Grave accent (`)
+//    fn+e    Acute accent (´)
+//    fn+i    Circumflex (^)
+//    fn+u    Diaresis / umlaut / trema (¨)
+//    fn+c    Cedilla (¸)
+//    fn+n    Tilde (˜)
+//
+// Together, these functions make for a nice "polyglot" keyboard: one that can easily type in a wide
+// variety of languages, which is very useful for people who, well, need to type in a bunch of
+// languages.
+//
+// The major TODOs are:
+//   - Update the layer selection logic (and add visible layer cues);
+//   - Factor the code below so that the data layers are more clearly separated from the code logic,
+//     so that other users of this keymap can easily add whichever alphabets they need without
+//     having to deeply understand the implementation.
 enum custom_keycodes {
  // We provide special layer management keys:
@@ -32,6 +109,16 @@ enum custom_keycodes {
  KC_GREEK = SAFE_RANGE,
  KC_CADET,
  KC_LAYER_LOCK,
+  // These are the keycodes generated by the various "accent request" keystrokes.
+  KC_ACCENT_START,
+  KC_CGRV = KC_ACCENT_START,  // Grave accent
+  KC_CAGU,  // Acute accent
+  KC_CDIA,  // Diaresis / umlaut / trema
+  KC_CCIR,  // Circumflex
+  KC_CCED,  // Cedilla
+  KC_CTIL,  // Tilde
+  KC_ACCENT_END,
 };
 enum layers_keymap {
@@ -49,21 +136,6 @@ enum layers_keymap {
 #define MO_FN MO(_FUNCTION)
 #define KC_LLCK KC_LAYER_LOCK
-// TODO: To generalize this, we want some #defines that let us specify how each key on the base
-// layer should map to the four special layers, and then use that plus the base layer definition to
-// autogenerate the keymaps for the other layers.
-// TODO: It would also be nice to be able to put the actual code points in here, rather than
-// numbers.
-// Accent marks
-#define CMB_GRV H(0300)
-#define CMB_AGU H(0301)
-#define CMB_DIA H(0308)
-#define CMB_CIR H(0302)
-#define CMB_MAC H(0304)
-#define CMB_CED H(0327)
-#define CMB_TIL H(0303)
 const uint16_t PROGMEM keymaps[][MATRIX_ROWS][MATRIX_COLS] = {
  // NB: Using GESC for escape in the QWERTY layer as a temporary hack because I messed up the
@@ -164,14 +236,119 @@ const uint16_t PROGMEM keymaps[][MATRIX_ROWS][MATRIX_COLS] = {
  // Function layer is mostly for keyboard meta-control operations, but also contains the combining
  // accent marks. These are deliberately placed to match where the analogous controls go on Mac OS.
        [_FUNCTION] = LAYOUT_hotswap(
-    CMB_GRV, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, KC_MUTE, KC_VOLD, KC_VOLU, _______, _______, RESET,
+    KC_CGRV, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, KC_MUTE, KC_VOLD, KC_VOLU, _______, _______, RESET,
-    CMB_GRV, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______,          _______, _______, _______, _______, _______,
+    KC_CGRV, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______,          _______, _______, _______, _______, _______,
-    _______, _______, _______, CMB_AGU, _______, _______, _______, CMB_DIA, CMB_CIR, CMB_MAC, _______, _______, _______,          _______, _______, _______, _______,
+    _______, _______, _______, KC_CAGU, _______, _______, _______, KC_CDIA, KC_CCIR, _______, _______, _______, _______,          _______, _______, _______, _______,
    _______, _______, _______, UC_M_OS, UC_M_LN, UC_M_WI, UC_M_BS, UC_M_WC, _______, _______, _______, _______,                   _______, _______, _______, _______, _______,
-    _______,          _______, _______, CMB_CED, _______, _______, CMB_TIL, _______, _______, _______, _______, _______,          _______, _______, _______, _______,
+    _______,          _______, _______, KC_CCED, _______, _______, KC_CTIL, _______, _______, _______, _______, _______,          _______, _______, _______, _______,
    _______, _______, _______,                            _______,                            _______, _______, _______, _______, _______, _______, _______, _______, _______),
 };
+////////////////////////////////////////////////////////////////////////////////////////////////////
+// Accent implementation
+//
+// In the body of process_record_user, we store an "accent_request", which is the accent keycode if
+// one was just selected, or zero otherwise. When the *next* key is hit, we look up whether the
+// accent request plus that next keycode (plus the state of the shift key) together amount to an
+// interesting combined (NFKC) character, and if so, emit it; otherwise, we emit the accent as a
+// separate character and then process the next key normally. The resulting UI behavior is similar
+// to that of the combining accent keys in MacOS.
+//
+// We store two arrays, depending on whether shift is or isn't held. Each is two-dimensional, with
+// its outer key by the next keycode struck, and the inner key by the accent requested. The outer
+// array has KC_Z + 1 as its upper bound, so that we can save memory by only coding alphabetic keys.
+// The contents are either Unicode code points, or zero to indicate that we don't have a point for
+// this combination.
+#define KC_NUM_ACCENTS (KC_ACCENT_END - KC_ACCENT_START)
+#define KC_NUM_SLOTS (KC_Z + 1)
+const uint16_t PROGMEM unshifted_accents[KC_NUM_SLOTS][KC_NUM_ACCENTS] = {
+  //         KC_CGRV, KC_CAGU, KC_CDIA, KC_CCIR, KC_CCED, KC_CTIL
+  [KC_A] = { 0x00e0,  0x00e1,  0x00e4,  0x00e2,  0,       0x00e3 },
+  [KC_E] = { 0x00e8,  0x00e9,  0x00eb,  0x00ea,  0,       0      },
+  [KC_I] = { 0x00ec,  0x00ed,  0x00ef,  0x00ee,  0,       0      },
+  [KC_O] = { 0x00f2,  0x00f3,  0x00f6,  0x00f4,  0,       0x00f5 },
+  [KC_U] = { 0x00f9,  0x00fa,  0x00fc,  0x00fb,  0,       0      },
+  [KC_Y] = { 0,       0,       0x00ff,  0,       0,       0      },
+  [KC_N] = { 0,       0,       0,       0,       0,       0x00f1 },
+  [KC_C] = { 0,       0,       0,       0,       0x00e7,  0      },
+};
+const uint16_t PROGMEM shifted_accents[KC_NUM_SLOTS][KC_NUM_ACCENTS] = {
+  //         KC_CGRV, KC_CAGU, KC_CDIA, KC_CCIR, KC_CCED, KC_CTIL
+  [KC_A] = { 0x00c0,  0x00c1,  0x00c4,  0x00c2,  0,       0x00c3 },
+  [KC_E] = { 0x00c8,  0x00c9,  0x00cb,  0x00ca,  0,       0      },
+  [KC_I] = { 0x00cc,  0x00cd,  0x00cf,  0x00ce,  0,       0      },
+  [KC_O] = { 0x00d2,  0x00d3,  0x00d6,  0x00d4,  0,       0x00d5 },
+  [KC_U] = { 0x00d9,  0x00da,  0x00dc,  0x00db,  0,       0      },
+  [KC_Y] = { 0,       0,       0x00df,  0,       0,       0      },
+  [KC_N] = { 0,       0,       0,       0,       0,       0x00d1 },
+  [KC_C] = { 0,       0,       0,       0,       0x00c7,  0      },
+};
+// The uncombined and combined forms of the accents, for when we want to emit them as single
+// characters.
+const uint16_t PROGMEM uncombined_accents[KC_NUM_ACCENTS] = {
+  [KC_CGRV - KC_ACCENT_START] = 0x0060,
+  [KC_CAGU - KC_ACCENT_START] = 0x00b4,
+  [KC_CDIA - KC_ACCENT_START] = 0x00a8,
+  [KC_CCIR - KC_ACCENT_START] = 0x005e,
+  [KC_CCED - KC_ACCENT_START] = 0x00b8,
+  [KC_CTIL - KC_ACCENT_START] = 0x02dc,
+};
+const uint16_t PROGMEM combined_accents[KC_NUM_ACCENTS] = {
+  [KC_CGRV - KC_ACCENT_START] = 0x0300,
+  [KC_CAGU - KC_ACCENT_START] = 0x0301,
+  [KC_CDIA - KC_ACCENT_START] = 0x0308,
+  [KC_CCIR - KC_ACCENT_START] = 0x0302,
+  [KC_CCED - KC_ACCENT_START] = 0x0327,
+  [KC_CTIL - KC_ACCENT_START] = 0x0303,
+};
+// This function manages keypresses that happen after an accent has been selected by an earlier
+// keypress.
+// Args:
+//   accent_key: The accent key which was earlier selected. This must be in the range
+//     [KC_ACCENT_START, KC_ACCENT_END).
+//   keycode: The keycode which was just pressed.
+//   is_shifted: The current shift state (as set by a combination of shift and caps lock)
+//   force_no_accent: If true, we're in a situation where we want to force there to be no
+//     accent combination -- if e.g. we're in a non-QWERTY layer, or if other modifier keys
+//     are held.
+//
+// Returns true if the keycode has been completely handled by this function (and so should not be
+// processed further by process_record_user) or false otherwise.
+bool process_key_after_accent(
+    uint16_t accent_key,
+    uint16_t keycode,
+    bool is_shifted,
+    bool force_no_accent
+) {
+  assert(accent_key >= KC_ACCENT_START);
+  assert(accent_key < KC_ACCENT_END);
+  const int accent_index = accent_key - KC_ACCENT_START;
+  // If the keycode is outside A..Z, or force_no_accent is set, we know we shouldn't even bother
+  // with a table lookup.
+  if (keycode <= KC_Z && !force_no_accent) {
+    // Pick the correct array. Because this is progmem, we're going to need to do the
+    // two-dimensional array indexing by hand, and so we just cast it to a single-dimensional array.
+    const uint16_t *points = (const uint16_t*)(is_shifted ? shifted_accents : unshifted_accents);
+    const uint16_t code_point = pgm_read_word_near(points + KC_NUM_ACCENTS * keycode + accent_index);
+    if (code_point) {
+      register_unicode(code_point);
+      return true;
+    }
+  }
+  // If we get here, there was no accent match. Emit the accent as its own character, and then let
+  // the caller figure out what to do next.
+  register_unicode(pgm_read_word_near(uncombined_accents + accent_index));
+  return false;
+}
 // Layer bitfields.
 #define GREEK_LAYER (1UL << _GREEK)
 #define SHIFTGREEK_LAYER (1UL << _SHIFTGREEK)
@@ -185,6 +362,8 @@ bool process_record_user(uint16_t keycode, keyrecord_t *record) {
  // get_mods or the like, because this function is called *before* that's updated!
  static bool shift_held = false;
  static bool alt_held = false;
+  static bool ctrl_held = false;
+  static bool super_held = false;
  static bool greek_held = false;
  static bool cadet_held = false;
@@ -192,18 +371,36 @@ bool process_record_user(uint16_t keycode, keyrecord_t *record) {
  static bool shift_lock = false;
  static int layer_lock = _QWERTY;
-  // Process any modifier key presses.
+  // The accent request, or zero if there isn't one.
+  static uint16_t accent_request = 0;
+  // If this is set to true, don't trigger any handling of pending accent requests. That's what we
+  // want to do if e.g. the user just hit the shift key or something.
+  bool ignore_accent_change = !record->event.pressed;
+  // Step 1: Process any modifier key state changes, so we can maintain that state.
  if (keycode == KC_LSHIFT || keycode == KC_RSHIFT) {
    shift_held = record->event.pressed;
+    ignore_accent_change = true;
  } else if (keycode == KC_LALT || keycode == KC_RALT) {
    alt_held = record->event.pressed;
+    ignore_accent_change = true;
+  } else if (keycode == KC_LCTRL || keycode == KC_RCTRL) {
+    ctrl_held = record->event.pressed;
+    ignore_accent_change = true;
+  } else if (keycode == KC_LGUI || keycode == KC_RGUI) {
+    super_held = record->event.pressed;
+    ignore_accent_change = true;
  } else if (keycode == KC_GREEK) {
    greek_held = record->event.pressed;
+    ignore_accent_change = true;
  } else if (keycode == KC_CADET) {
    cadet_held = record->event.pressed;
+    ignore_accent_change = true;
  }
-  // Now let's transform these into the "cadet request" and "greek request."
+  // Step 2: Figure out which layer we're supposed to be in, by transforming all the prior stuff
+  // into layer requests.
  const bool greek_request = (greek_held && !alt_held);
  const bool cadet_request = (cadet_held || (greek_held && alt_held));
@@ -260,8 +457,33 @@ bool process_record_user(uint16_t keycode, keyrecord_t *record) {
    layer_state_set(new_layer_state);
  }
-  // TODO: We can update LED states based on shift_lock (caps), layer_lock (layer lock), and
+  // Step 3: Handle accents. If there's a pending accent request, process it. If what the user just
-  // base_layer (base layer).
+  // hit creates a new accent request, update the pending state for the next keypress.
+  if (!ignore_accent_change && accent_request && record->event.pressed) {
+    // Only do the accent stuff if we're in the QWERTY layer and we aren't modifying something.
+    const bool force_no_accent = (
+        actual_layer != _QWERTY ||
+        ctrl_held ||
+        super_held ||
+        alt_held
+    );
+    const uint16_t old_accent = accent_request;
+    accent_request = 0;
+    if (process_key_after_accent(old_accent, keycode, shifted, force_no_accent)) {
+      return false;
+    }
+  }
+  // And if a new accent request just arrived, update accent_request.
+  if (keycode >= KC_ACCENT_START && keycode < KC_ACCENT_END && record->event.pressed) {
+    if (shifted) {
+      // Shift + accent request generates the combining accent key, and leaves accent_request alone.
+      register_unicode(pgm_read_word_near(combined_accents + keycode - KC_ACCENT_START));
+      return false;
+    } else {
+      accent_request = keycode;
+    }
+  }
  return true;
 }
author	yonatanzunger <30514250+yonatanzunger@users.noreply.github.com>	2021-01-11 01:21:44 -0800
committer	GitHub <noreply@github.com>	2021-01-11 01:21:44 -0800
commit	554b937d21f8c50515c22498f4f46df0b3ae6569 (patch)
tree	ed202d5c70dc0c0d538e0bb4e9c2f0fd3ec3804c
parent	b113888ec55e456ffcff2d6b04ad29309d01b325 (diff)
download	qmk_firmware-554b937d21f8c50515c22498f4f46df0b3ae6569.tar.gz qmk_firmware-554b937d21f8c50515c22498f4f46df0b3ae6569.zip

diff --git a/keyboards/melody96/keymaps/zunger/keymap.c b/keyboards/melody96/keymaps/zunger/keymap.c index d396de683..d0d2698b7 100644 --- a/keyboards/melody96/keymaps/zunger/keymap.c +++ b/keyboards/melody96/keymaps/zunger/keymap.c
@@ -14,6 +14,83 @@
14	* along with this program. If not, see <http://www.gnu.org/licenses/>.	14	* along with this program. If not, see <http://www.gnu.org/licenses/>.
15	*/	15	*/
16	#include QMK_KEYBOARD_H	16	#include QMK_KEYBOARD_H
		17	#include <assert.h>
		18
		19	// This keymap is designed to make it easy to type in a wide variety of languages, as well as
		20	// generate mathematical symbols (à la Space Cadet).
		21	//
		22	// LAYER MAGIC (aka, typing in many alphabets)
		23	// This keyboard has three "base" layers: QWERTY, GREEK, and CADET. The GREEK and CADET layers
		24	// are actually full of Unicode points, and so which point they generate depends on things like
		25	// whether the shift key is down. To handle this, each of those layers is actually two layers, one
		26	// with and one without shift. In our main loop, we manage modifier state detection, as well as
		27	// layer switch detection, and pick the right layer on the fly.
		28	// Layers are selected with a combination of three keys. The "Greek" and "Cadet" keys act like
		29	// modifiers: When held down, they transiently select the indicated base layer. The "Layer Lock" key
		30	// locks the value of the base layer at whatever is currently held; so e.g., if you hold Greek +
		31	// Layer Lock, you'll stay in Greek mode until you hit Layer Lock again without any of the mods
		32	// held.
		33	// TODO: This system of layer selection is nice for math, but it's not very nice for actually
		34	// typing in multiple languages. It seems like a better plan will be to reserve one key for each
		35	// base layer -- maybe fn + F(n) -- which can either be held as a modifier or tapped to switch
		36	// layers. That will open up adding some more languages, like Yiddish, but to do this effectively
		37	// we'll need to find a good UI with which to show the currently selected layer. Need to check what
		38	// the melody96 has in the way of outputs (LEDs, sound, etc).
		39	//
		40	// ACCENT MAGIC (aka, typing conveniently in Romance languages)
		41	// We want to support easy typing of diacritical marks. We can't rely on the host OS for this,
		42	// because (e.g.) on MacOS, to make any of the other stuff work, we need to be using the Unicode
		43	// input method at the OS level, which breaks all the normal accent stuff on that end. So we do it
		44	// ourselves. Accents can actually be invoked in two different ways: one fast and very compatible,
		45	// one very versatile but with occasional compatibility problems.
		46	//
		47	// THE MAIN WAY: You can hit one of the "accent request" key patterns immediately before typing
		48	// a letter to be accented. It will emit the corresponding accented Unicode. For example, you can
		49	// hit fn-e to request an acute accent, followed by i, and it will output í, U+00ED LATIN SMALL
		50	// LETTER I WITH ACUTE. These "combined characters" are in Unicode normal form C (NFKC), which is
		51	// important because many European websites and apps, in particular, tend to behave very badly
		52	// (misunderstanding and/or crashing) when presented with characters in other forms! The catch is
		53	// that this only works for the various combinations of letters and accents found in the Latin-1
		54	// supplement block of Unicode -- basically, things you need for Western European languages.
		55	//
		56	// (NB: If you make an accent request followed by a letter which can't take the corresponding
		57	// accent, it will output the uncombined form of the accent followed by whatever you typed; so
		58	// e.g., if you hit fn-e followed by f, it will output ´f, U+00B4 ACUTE ACCENT followed by an
		59	// ordinary f. This is very similar to the default behavior of MacOS.)
		60	//
		61	// THE FLEXIBLE WAY: If you hit the accent request with a shift -- e.g., fn-shift-e -- it will
		62	// instead immediately output the corresponding combining Unicode accent mark, which will modify
		63	// the previous character you typed. For example, if you type i followed by fn-shift-e, it will
		64	// generate í. But don't be fooled by visual similarity: unlike the previous example, this one is
		65	// an ordinary i followed by U+0301 COMBINING ACUTE ACCENT. It's actually two symbols, and this
		66	// is Unicode normal form D (NFKD). Unlike NFKC, there are NFKD representations of far more
		67	// combinations of letters and accents, and it's easy to add more of these if you need. (The NFKC
		68	// representation of such combinations is identical to their NFKD representation)
		69	//
		70	// Programs that try to compare Unicode strings should first normalize them by converting them
		71	// all into one normal form or another, and there are functions in every programming language to
		72	// do this -- e.g., JavaScript's string.normalize() -- but lots of programmers fail to understand
		73	// this, and so write code that massively freaks out when it encounters the wrong form.
		74	//
		75	// The current accent request codes are modeled on the ones in MacOS.
		76	//
		77	// fn+` Grave accent (`)
		78	// fn+e Acute accent (´)
		79	// fn+i Circumflex (^)
		80	// fn+u Diaresis / umlaut / trema (¨)
		81	// fn+c Cedilla (¸)
		82	// fn+n Tilde (˜)
		83	//
		84	// Together, these functions make for a nice "polyglot" keyboard: one that can easily type in a wide
		85	// variety of languages, which is very useful for people who, well, need to type in a bunch of
		86	// languages.
		87	//
		88	// The major TODOs are:
		89	// - Update the layer selection logic (and add visible layer cues);
		90	// - Factor the code below so that the data layers are more clearly separated from the code logic,
		91	// so that other users of this keymap can easily add whichever alphabets they need without
		92	// having to deeply understand the implementation.
		93
17		94
18	enum custom_keycodes {	95	enum custom_keycodes {
19	// We provide special layer management keys:	96	// We provide special layer management keys:
@@ -32,6 +109,16 @@ enum custom_keycodes {
32	KC_GREEK = SAFE_RANGE,	109	KC_GREEK = SAFE_RANGE,
33	KC_CADET,	110	KC_CADET,
34	KC_LAYER_LOCK,	111	KC_LAYER_LOCK,
		112
		113	// These are the keycodes generated by the various "accent request" keystrokes.
		114	KC_ACCENT_START,
		115	KC_CGRV = KC_ACCENT_START, // Grave accent
		116	KC_CAGU, // Acute accent
		117	KC_CDIA, // Diaresis / umlaut / trema
		118	KC_CCIR, // Circumflex
		119	KC_CCED, // Cedilla
		120	KC_CTIL, // Tilde
		121	KC_ACCENT_END,
35	};	122	};
36		123
37	enum layers_keymap {	124	enum layers_keymap {
@@ -49,21 +136,6 @@ enum layers_keymap {
49	#define MO_FN MO(_FUNCTION)	136	#define MO_FN MO(_FUNCTION)
50	#define KC_LLCK KC_LAYER_LOCK	137	#define KC_LLCK KC_LAYER_LOCK
51		138
52	// TODO: To generalize this, we want some #defines that let us specify how each key on the base
53	// layer should map to the four special layers, and then use that plus the base layer definition to
54	// autogenerate the keymaps for the other layers.
55	// TODO: It would also be nice to be able to put the actual code points in here, rather than
56	// numbers.
57
58	// Accent marks
59	#define CMB_GRV H(0300)
60	#define CMB_AGU H(0301)
61	#define CMB_DIA H(0308)
62	#define CMB_CIR H(0302)
63	#define CMB_MAC H(0304)
64	#define CMB_CED H(0327)
65	#define CMB_TIL H(0303)
66
67		139
68	const uint16_t PROGMEM keymaps[][MATRIX_ROWS][MATRIX_COLS] = {	140	const uint16_t PROGMEM keymaps[][MATRIX_ROWS][MATRIX_COLS] = {
69	// NB: Using GESC for escape in the QWERTY layer as a temporary hack because I messed up the	141	// NB: Using GESC for escape in the QWERTY layer as a temporary hack because I messed up the
@@ -164,14 +236,119 @@ const uint16_t PROGMEM keymaps[][MATRIX_ROWS][MATRIX_COLS] = {
164	// Function layer is mostly for keyboard meta-control operations, but also contains the combining	236	// Function layer is mostly for keyboard meta-control operations, but also contains the combining
165	// accent marks. These are deliberately placed to match where the analogous controls go on Mac OS.	237	// accent marks. These are deliberately placed to match where the analogous controls go on Mac OS.
166	[_FUNCTION] = LAYOUT_hotswap(	238	[_FUNCTION] = LAYOUT_hotswap(
167	CMB_GRV, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, KC_MUTE, KC_VOLD, KC_VOLU, _______, _______, RESET,	239	KC_CGRV, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, KC_MUTE, KC_VOLD, KC_VOLU, _______, _______, RESET,
168	CMB_GRV, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______,	240	KC_CGRV, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______,
169	_______, _______, _______, CMB_AGU, _______, _______, _______, CMB_DIA, CMB_CIR, CMB_MAC, _______, _______, _______, _______, _______, _______, _______,	241	_______, _______, _______, KC_CAGU, _______, _______, _______, KC_CDIA, KC_CCIR, _______, _______, _______, _______, _______, _______, _______, _______,
170	_______, _______, _______, UC_M_OS, UC_M_LN, UC_M_WI, UC_M_BS, UC_M_WC, _______, _______, _______, _______, _______, _______, _______, _______, _______,	242	_______, _______, _______, UC_M_OS, UC_M_LN, UC_M_WI, UC_M_BS, UC_M_WC, _______, _______, _______, _______, _______, _______, _______, _______, _______,
171	_______, _______, _______, CMB_CED, _______, _______, CMB_TIL, _______, _______, _______, _______, _______, _______, _______, _______, _______,	243	_______, _______, _______, KC_CCED, _______, _______, KC_CTIL, _______, _______, _______, _______, _______, _______, _______, _______, _______,
172	_______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______),	244	_______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______, _______),
173	};	245	};
174		246
		247	////////////////////////////////////////////////////////////////////////////////////////////////////
		248	// Accent implementation
		249	//
		250	// In the body of process_record_user, we store an "accent_request", which is the accent keycode if
		251	// one was just selected, or zero otherwise. When the next key is hit, we look up whether the
		252	// accent request plus that next keycode (plus the state of the shift key) together amount to an
		253	// interesting combined (NFKC) character, and if so, emit it; otherwise, we emit the accent as a
		254	// separate character and then process the next key normally. The resulting UI behavior is similar
		255	// to that of the combining accent keys in MacOS.
		256	//
		257	// We store two arrays, depending on whether shift is or isn't held. Each is two-dimensional, with
		258	// its outer key by the next keycode struck, and the inner key by the accent requested. The outer
		259	// array has KC_Z + 1 as its upper bound, so that we can save memory by only coding alphabetic keys.
		260	// The contents are either Unicode code points, or zero to indicate that we don't have a point for
		261	// this combination.
		262
		263	#define KC_NUM_ACCENTS (KC_ACCENT_END - KC_ACCENT_START)
		264	#define KC_NUM_SLOTS (KC_Z + 1)
		265
		266	const uint16_t PROGMEM unshifted_accents[KC_NUM_SLOTS][KC_NUM_ACCENTS] = {
		267	// KC_CGRV, KC_CAGU, KC_CDIA, KC_CCIR, KC_CCED, KC_CTIL
		268	[KC_A] = { 0x00e0, 0x00e1, 0x00e4, 0x00e2, 0, 0x00e3 },
		269	[KC_E] = { 0x00e8, 0x00e9, 0x00eb, 0x00ea, 0, 0 },
		270	[KC_I] = { 0x00ec, 0x00ed, 0x00ef, 0x00ee, 0, 0 },
		271	[KC_O] = { 0x00f2, 0x00f3, 0x00f6, 0x00f4, 0, 0x00f5 },
		272	[KC_U] = { 0x00f9, 0x00fa, 0x00fc, 0x00fb, 0, 0 },
		273	[KC_Y] = { 0, 0, 0x00ff, 0, 0, 0 },
		274	[KC_N] = { 0, 0, 0, 0, 0, 0x00f1 },
		275	[KC_C] = { 0, 0, 0, 0, 0x00e7, 0 },
		276	};
		277
		278	const uint16_t PROGMEM shifted_accents[KC_NUM_SLOTS][KC_NUM_ACCENTS] = {
		279	// KC_CGRV, KC_CAGU, KC_CDIA, KC_CCIR, KC_CCED, KC_CTIL
		280	[KC_A] = { 0x00c0, 0x00c1, 0x00c4, 0x00c2, 0, 0x00c3 },
		281	[KC_E] = { 0x00c8, 0x00c9, 0x00cb, 0x00ca, 0, 0 },
		282	[KC_I] = { 0x00cc, 0x00cd, 0x00cf, 0x00ce, 0, 0 },
		283	[KC_O] = { 0x00d2, 0x00d3, 0x00d6, 0x00d4, 0, 0x00d5 },
		284	[KC_U] = { 0x00d9, 0x00da, 0x00dc, 0x00db, 0, 0 },
		285	[KC_Y] = { 0, 0, 0x00df, 0, 0, 0 },
		286	[KC_N] = { 0, 0, 0, 0, 0, 0x00d1 },
		287	[KC_C] = { 0, 0, 0, 0, 0x00c7, 0 },
		288	};
		289
		290	// The uncombined and combined forms of the accents, for when we want to emit them as single
		291	// characters.
		292	const uint16_t PROGMEM uncombined_accents[KC_NUM_ACCENTS] = {
		293	[KC_CGRV - KC_ACCENT_START] = 0x0060,
		294	[KC_CAGU - KC_ACCENT_START] = 0x00b4,
		295	[KC_CDIA - KC_ACCENT_START] = 0x00a8,
		296	[KC_CCIR - KC_ACCENT_START] = 0x005e,
		297	[KC_CCED - KC_ACCENT_START] = 0x00b8,
		298	[KC_CTIL - KC_ACCENT_START] = 0x02dc,
		299	};
		300
		301	const uint16_t PROGMEM combined_accents[KC_NUM_ACCENTS] = {
		302	[KC_CGRV - KC_ACCENT_START] = 0x0300,
		303	[KC_CAGU - KC_ACCENT_START] = 0x0301,
		304	[KC_CDIA - KC_ACCENT_START] = 0x0308,
		305	[KC_CCIR - KC_ACCENT_START] = 0x0302,
		306	[KC_CCED - KC_ACCENT_START] = 0x0327,
		307	[KC_CTIL - KC_ACCENT_START] = 0x0303,
		308	};
		309
		310	// This function manages keypresses that happen after an accent has been selected by an earlier
		311	// keypress.
		312	// Args:
		313	// accent_key: The accent key which was earlier selected. This must be in the range
		314	// [KC_ACCENT_START, KC_ACCENT_END).
		315	// keycode: The keycode which was just pressed.
		316	// is_shifted: The current shift state (as set by a combination of shift and caps lock)
		317	// force_no_accent: If true, we're in a situation where we want to force there to be no
		318	// accent combination -- if e.g. we're in a non-QWERTY layer, or if other modifier keys
		319	// are held.
		320	//
		321	// Returns true if the keycode has been completely handled by this function (and so should not be
		322	// processed further by process_record_user) or false otherwise.
		323	bool process_key_after_accent(
		324	uint16_t accent_key,
		325	uint16_t keycode,
		326	bool is_shifted,
		327	bool force_no_accent
		328	) {
		329	assert(accent_key >= KC_ACCENT_START);
		330	assert(accent_key < KC_ACCENT_END);
		331	const int accent_index = accent_key - KC_ACCENT_START;
		332
		333	// If the keycode is outside A..Z, or force_no_accent is set, we know we shouldn't even bother
		334	// with a table lookup.
		335	if (keycode <= KC_Z && !force_no_accent) {
		336	// Pick the correct array. Because this is progmem, we're going to need to do the
		337	// two-dimensional array indexing by hand, and so we just cast it to a single-dimensional array.
		338	const uint16_t points = (const uint16_t)(is_shifted ? shifted_accents : unshifted_accents);
		339	const uint16_t code_point = pgm_read_word_near(points + KC_NUM_ACCENTS * keycode + accent_index);
		340	if (code_point) {
		341	register_unicode(code_point);
		342	return true;
		343	}
		344	}
		345
		346	// If we get here, there was no accent match. Emit the accent as its own character, and then let
		347	// the caller figure out what to do next.
		348	register_unicode(pgm_read_word_near(uncombined_accents + accent_index));
		349	return false;
		350	}
		351
175	// Layer bitfields.	352	// Layer bitfields.
176	#define GREEK_LAYER (1UL << _GREEK)	353	#define GREEK_LAYER (1UL << _GREEK)
177	#define SHIFTGREEK_LAYER (1UL << _SHIFTGREEK)	354	#define SHIFTGREEK_LAYER (1UL << _SHIFTGREEK)
@@ -185,6 +362,8 @@ bool process_record_user(uint16_t keycode, keyrecord_t *record) {
185	// get_mods or the like, because this function is called before that's updated!	362	// get_mods or the like, because this function is called before that's updated!
186	static bool shift_held = false;	363	static bool shift_held = false;
187	static bool alt_held = false;	364	static bool alt_held = false;
		365	static bool ctrl_held = false;
		366	static bool super_held = false;
188	static bool greek_held = false;	367	static bool greek_held = false;
189	static bool cadet_held = false;	368	static bool cadet_held = false;
190		369
@@ -192,18 +371,36 @@ bool process_record_user(uint16_t keycode, keyrecord_t *record) {
192	static bool shift_lock = false;	371	static bool shift_lock = false;
193	static int layer_lock = _QWERTY;	372	static int layer_lock = _QWERTY;
194		373
195	// Process any modifier key presses.	374	// The accent request, or zero if there isn't one.
		375	static uint16_t accent_request = 0;
		376
		377	// If this is set to true, don't trigger any handling of pending accent requests. That's what we
		378	// want to do if e.g. the user just hit the shift key or something.
		379	bool ignore_accent_change = !record->event.pressed;
		380
		381	// Step 1: Process any modifier key state changes, so we can maintain that state.
196	if (keycode == KC_LSHIFT \|\| keycode == KC_RSHIFT) {	382	if (keycode == KC_LSHIFT \|\| keycode == KC_RSHIFT) {
197	shift_held = record->event.pressed;	383	shift_held = record->event.pressed;
		384	ignore_accent_change = true;
198	} else if (keycode == KC_LALT \|\| keycode == KC_RALT) {	385	} else if (keycode == KC_LALT \|\| keycode == KC_RALT) {
199	alt_held = record->event.pressed;	386	alt_held = record->event.pressed;
		387	ignore_accent_change = true;
		388	} else if (keycode == KC_LCTRL \|\| keycode == KC_RCTRL) {
		389	ctrl_held = record->event.pressed;
		390	ignore_accent_change = true;
		391	} else if (keycode == KC_LGUI \|\| keycode == KC_RGUI) {
		392	super_held = record->event.pressed;
		393	ignore_accent_change = true;
200	} else if (keycode == KC_GREEK) {	394	} else if (keycode == KC_GREEK) {
201	greek_held = record->event.pressed;	395	greek_held = record->event.pressed;
		396	ignore_accent_change = true;
202	} else if (keycode == KC_CADET) {	397	} else if (keycode == KC_CADET) {
203	cadet_held = record->event.pressed;	398	cadet_held = record->event.pressed;
		399	ignore_accent_change = true;
204	}	400	}
205		401
206	// Now let's transform these into the "cadet request" and "greek request."	402	// Step 2: Figure out which layer we're supposed to be in, by transforming all the prior stuff
		403	// into layer requests.
207	const bool greek_request = (greek_held && !alt_held);	404	const bool greek_request = (greek_held && !alt_held);
208	const bool cadet_request = (cadet_held \|\| (greek_held && alt_held));	405	const bool cadet_request = (cadet_held \|\| (greek_held && alt_held));
209		406
@@ -260,8 +457,33 @@ bool process_record_user(uint16_t keycode, keyrecord_t *record) {
260	layer_state_set(new_layer_state);	457	layer_state_set(new_layer_state);
261	}	458	}
262		459
263	// TODO: We can update LED states based on shift_lock (caps), layer_lock (layer lock), and	460	// Step 3: Handle accents. If there's a pending accent request, process it. If what the user just
264	// base_layer (base layer).	461	// hit creates a new accent request, update the pending state for the next keypress.
		462	if (!ignore_accent_change && accent_request && record->event.pressed) {
		463	// Only do the accent stuff if we're in the QWERTY layer and we aren't modifying something.
		464	const bool force_no_accent = (
		465	actual_layer != _QWERTY \|\|
		466	ctrl_held \|\|
		467	super_held \|\|
		468	alt_held
		469	);
		470	const uint16_t old_accent = accent_request;
		471	accent_request = 0;
		472	if (process_key_after_accent(old_accent, keycode, shifted, force_no_accent)) {
		473	return false;
		474	}
		475	}
		476
		477	// And if a new accent request just arrived, update accent_request.
		478	if (keycode >= KC_ACCENT_START && keycode < KC_ACCENT_END && record->event.pressed) {
		479	if (shifted) {
		480	// Shift + accent request generates the combining accent key, and leaves accent_request alone.
		481	register_unicode(pgm_read_word_near(combined_accents + keycode - KC_ACCENT_START));
		482	return false;
		483	} else {
		484	accent_request = keycode;
		485	}
		486	}
265		487
266	return true;	488	return true;
267	}	489	}