Hacker News new | past | comments | ask | show | jobs | submit login

> Additionally, many Python developers using Unix forget that the default encoding is platform dependent. They omit to specify encoding="utf-8" when they read text files encoded in UTF-8

"forget" or possibly simply aren't made well enough aware? I genuinely thought that python would only use UTF-8 for everything unless you explicitly ask it to do otherwise.




It actually depends!

`bytes.decode` (and `str.encode`) have used UTF-8 as a default since at least Python 3.

However, the default encoding used for decoding the name of files use ` sys.getfilesystemencoding()`, which is also UTF-8 on Windows and macos, but will vary with the locale on linux (specifically with CODESET).

Finally, `open` will directly use `locale.getencoding()`.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: