1,509 voteson the backlog · 63 comments · Command Prompt / Console / Windows Subsystem for Linux » Console/Terminal · Flag idea as inappropriate… · Admin →
710 votes19 comments · Command Prompt / Console / Windows Subsystem for Linux » Console/Terminal · Flag idea as inappropriate… · Admin →
As Chip said, UTF-16 is rather baked into the stuff we do. The console host does use UTF-16 somewhere in there. :P We just have the matter of dealing with code pages throughout the history of computing existence that causes us heartburn every time we think about how to fix this. :( It’s definitely something that we would like to look into on our backlog.
—Michaeleryk sun commented
Codepage 65001 (UTF-8) is still broken in the console. On the plus side, as of Windows 10, WriteFile to a console handle finally works correctly. It no longer confuses buffered writers by returning the number of UTF-16 codes written instead of the number of bytes written. But there's still no support for writing a UTF-8 encoded character split across 2 writes, which can happen when using a buffered writer such as a C FILE stream.
What's worse, the ReadConsoleA implementation in conhost.exe makes an incorrect assumption when calling WideCharToMultiByte in a Western locale. It assumes the current codepage is ANSI, in which a UTF-16 code maps to a single byte. So it tries to encode N UTF-16 codes as N bytes. This fails if even one non-ASCII character is entered (since that's at least 2 bytes when encoded as UTF-8), and it returns back to the client that it successfully read 0 bytes. With CPython, for example, this is interpreted as EOF, so the REPL quietly quits and input() raises EOFError. Maybe the console could instead assume the worst case that each UTF-16 code maps to 4 UTF-8 bytes.
On the subject of internationalization, the console shouldn't require a DBCS codepage to mix fullwidth (2 cells) and halfwidth (1 cell) glyphs. There should be a locale-neutral implementation based on Unicode character properties. It also should be able to render characters that require multiple UTF-16 codes, such as decomposed characters and surrogate pairs for astral characters such as emojis.
1,310 votes54 comments · Command Prompt / Console / Windows Subsystem for Linux » WSL/Bash · Flag idea as inappropriate… · Admin →
I’m excited to share that background task support is available in Windows Insider builds 17046 and later.