Hacker News new | past | comments | ask | show | jobs | submit login

You can verify on the documentation by switching the version.

So ... since 3.2: https://docs.python.org/3.2/library/stdtypes.html#bytes.deco... In 3.1 it was the default encoding of string (the type str I guess). https://docs.python.org/3.1/library/stdtypes.html#bytes.deco...




> In 3.1 it was the default encoding of string (the type str I guess).

No, what was used was what sys.getdefaultencoding(), which was already UTF-8 in 3.1 (I checked the source code).

At that time, the format used for representing `str` objects in memory depended on if you used a "narrow" (UTF-16) or "wide" (UTF-32) build of Python.

Fortunately, wide and narrow builds were abandonned in Python 3.2, with a new way of representing strings : current Python will use ASCII if there's no non-ASCII char, UCS-2 –UTF-16 without surrogate pairs — if there is no codepoint higher than U+FFFF, and UTF-32 else. But that did not exist in 3.1, where you could either use the "narrow" build of python (that used UTF-16) or the "wide" build (that used UTF-32).

See this article for a good overview of the history of strings in Python : https://tenthousandmeters.com/blog/python-behind-the-scenes-...


Thank you ! The documentation was misleading about "default encoding of string".


The simple thing to remember is that for all versions of Python going back 12 years, there's no such thing as "default encoding of string". A Python string is defined as a sequence of 32-bit Unicode codepoints, and that is how Python code perceives it in all respects. How it is stored internally is an implementation detail that does not affect you.


32 bit specifically?

The most expansive Unicode has ever been was 31 bits, and UTF-8 is also capable of at most 31 bits.


You're right, the docs just say "Unicode codepoints", and standard facilities like "\U..." or chr() will refuse anything above U+10FFFF. However I'm not sure that still holds true when third-party native modules are in the picture.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: