Windows is also a rare bird in UTF-16. "UTF-16 is used by the Windows API, and b...

0x1d7 · 2026-06-02T18:56:44 1780426604

NT shipped with USC-2 as UTF-8 (and -16) did not yet exist. USC-2 naturally translated to UTF-16, hence the choice. NT/Win32 is also designed for fixed-with code units, something UTF-8 doesn't support.

You can use UTF-8 on a per-application basis, within limits.

https://learn.microsoft.com/en-us/windows/apps/design/global...

Conversely, UEFI is UTF-16 only, thanks to Windows.

UTF-8 only would be an ABI breaking change, so that's not going to happen. We don't want the NT kernel to end up like Linux, after all :-)

yjftsjthsd-h · 2026-06-02T19:25:35 1780428335

> UTF-8 only would be an ABI breaking change, so that's not going to happen. We don't want the NT kernel to end up like Linux, after all :-)

I think you're making a joke, but it still doesn't make sense. Linux does avoid breaking changes to its userspace ABI

okanat · 2026-06-02T21:07:33 1780434453

Linux kernel's ABI/API surface is completely byte-based and is tiny compared to Win32 API. The stable ABI of Windows is strictly in the user space and it covers the entire useful operating system. Don't forget that glibc isn't that ABI stable nor anything that goes all the way up to systemd/X11/gettext/Wayland/cairo/GTK/Qt/glade/Pipewire/xdg. Nothing equivalent in Linux environment is stable, especially compared against Win32 system libraries.

You encounter encoding requirements not in kernel system calls but in more user-facing sides of the OS. So your comparison should include all the userspace components of a Linux system (e.g. gettext), if you're considering the places where you encounter actual Unicode string-based operations where UTF-16 comes into play in Windows.

ALL OF THE equivalent Windows libraries (user32.dll, kernel32.dll, ws2_32.dll, shellex.dll ...) to the Linux ones I counted above and more are ABI and API stable.

yjftsjthsd-h · 2026-06-02T21:26:46 1780435606

See, I was sympathetic to that view, except that they specifically posted

> We don't want the NT kernel to end up like Linux, after all

(Emphasis mine)

And that's... The one part of the OS where Linux is ABI stable.

okanat · 2026-06-02T23:01:12 1780441272

I think they didn't mean Linux as in the kernel sense but in the overall "Family of OSes and ecosystem with very unstable APIs" sense.

zahllos · 2026-06-02T18:20:09 1780424409

Additional Detail: it is specifically utf-16 little endian when a byte order mark is not used, which is the opposite of the recommended choice of big endian in the RFC.

Worse are the byte order marks required to support both endians that end up in files.

layer8 · 2026-06-02T20:28:36 1780432116

The development of Windows NT based on UCS-2 precedes the RFC by roughly a decade, and little-endian was the natural choice for the Intel PC platform. Obviously the endianness had to remain the same when UCS-2 was extended to support UTF-16.

zahllos · 2026-06-03T10:56:37 1780484197

Also true. The BOMs though are annoying.

Dwedit · 2026-06-02T18:16:23 1780424183

UTF-16 is also used by C#, Java, and JavaScript. Since JavaScript is so widely adopted, I wouldn't call it a rare bird. Not necessarily used when reading or writing files, but it's what's used internally for the strings. As a result, your strings use UTF-16 surrogate pairs to represent characters outside of the basic multilingual plane (such as Emoji).

electroly · 2026-06-03T00:41:43 1780447303

UTF-16 is the internal format of the ICU library (International Components for Unicode, the support library from the Unicode standards people) which is a common way to add "full fat" Unicode support to a programming language. This has knock-on effects everywhere. If you're using ICU, you either use UTF-16, too, or you constantly convert back and forth every time you interact with ICU. You're often best off using UTF-16 in memory and only converting to UTF-8 when you write files or transmit over the network.

wvenable · 2026-06-02T18:55:49 1780426549

> Windows is also a rare bird in UTF-16.

Web browsers use UTF-16 internally. So Windows isn't even largest "platform" that uses UTF-16.

bvanheu · 2026-06-02T20:45:03 1780433103

> Windows is also a rare bird in UTF-16.

an interesting tidbit, some Windows kernel developer realized that most registry keys are ascii anyways so they could save up to 50% space simply by storing the name as ascii. The flag is called "compressed name" and they will pad with 0x00 when reading the name to make a proper utf-16 string.