Filenames and Unicode Normalization Forms

Mac OS X uses NFD (Normalization Form D) for filenames while everything else (like Windows and GNU/Linux) seems to use NFC. This can sometimes lead to unpleasant surprises when moving files across platforms.

Example

You rsync over a file archive from your Mac to your GNU/Linux server. The filenames are not changed from NFD to NFC since rsync doesn’t care and Linux just treats filenames as byte sequences. Now you try to access the same files over Samba from your Mac. You discover that files with international characters aren’t accessible and curse Apple for Thinking Differently when it comes to filenames. Then you follow the steps below and everything works fine.

NFD to NFC Conversion

Here’s how you can convert NFD filenames to NFC:

  1. Install convmv. Read the documentation.
  2. Think of a character set and encoding other than Unicode/UTF-8 that includes all the characters you need. latin1 will probably be just fine unless your alphabet includes som wyrd symbols.
  3. convmv -t latin1 -f utf-8 -r --notest directory
  4. convmv -f latin1 -t utf-8 -r --notest --nfc directory

Resources