Hacker News new | past | comments | ask | show | jobs | submit login

Python has two types of strings: byte strings (every character is in the range of 0-255) and Unicode strings (every character is a Unicode codepoint). In Python 2.x, "" maps to a byte string and u"" maps to a Unicode string; in Python 3.x, "" maps to a unicode string and b"" maps to a byte string.

If you typed in "éķů" in Python 2.7, what you get is a string consisting of the hex chars 0xC3 0xA9 0xC4 0xB7 0xC5 0xAF, which if you printed it out and displayed it as UTF-8--the default of most terminals--would appear to be éķů. But "éķů"[1] would return a byte string of \xa9 which isn't valid UTF-8 and would likely display as garbage.

If you instead had used u"éķů", you'd instead get a string of three Unicode code points, U+00E9 U+0137 U+016F. And u"éķů"[1] would return u"ķ", which is a valid Unicode character.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: