Page 1 of 1

Handling unicode characters

Posted: Sat May 03, 2014 2:45 pm
by arawra
I've come across a rather bit of a pain for myself with pickling and loading pickles. When I am storing information that includes a player's name, I believe I'm going to need to make sure the strings are properly formatted so I don't get errors on the pickle load.

This happened when I was trying to load a pickle from ES:P that had names with unicode characters in them. I was able to go through and format them such that the pickle will load in Python 3, but it seems strange the pickle wouldn't load in Python 3 where all strings are unicode by default, but would load in Python 2.

I'm wondering what causes this behavior, as I'm assuming it has to do with the file processing on the open() method.

I'd also like to know how to handle names with unicode in SP, as I believe it was updated to Python 3.

Posted: Sat May 03, 2014 5:53 pm
by L'In20Cible
Well, it is hard to say what is the main problem without any code to reproduce it. I made some testing on my side and it is working just fine...

Posted: Sat May 03, 2014 7:42 pm
by arawra

Posted: Sat May 03, 2014 8:23 pm
by L'In20Cible

Syntax: Select all

>>> data = pickle.load(open('playerdict.txt','rb'), encoding='utf-8')

Posted: Sat May 03, 2014 9:03 pm
by arawra
Still not a good solution :\

For now, I just pickled a second dictionary that converted or replaced the unicode characters.

Syntax: Select all

>>> for x,y in data.items():
... if 'name' in data[x]: print(data[x]['name'])
...
name1
name2
name3
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "D:\Python33\lib\encodings\cp437.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_map)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 3-7: cha
racter maps to <undefined>
>>>

Posted: Sat May 03, 2014 9:21 pm
by L'In20Cible
This will open your file in UTF-8 but it won't change the encoding of the strings it contains. Since the default encoding used by print is sys.stdout.encoding (which is set to cp850), you will rather have to encode it into UTF8 to then decode it back in order to print it...

Posted: Sun May 04, 2014 7:55 pm
by Doldol
If the goal is to print, wouldn't this be what he's after then?

Syntax: Select all

data = pickle.load(open('playerdict.txt','rb'), encoding="bytes")


Preserve data as bytes instead of encoding to UTF-8 first? But I'm not an expert at this.

Posted: Sun May 04, 2014 8:03 pm
by arawra
Doldol wrote:If the goal is to print, wouldn't this be what he's after then?

Syntax: Select all

data = pickle.load(open('playerdict.txt','rb'), encoding="bytes")


Preserve data as bytes instead of encoding to UTF-8 first? But I'm not an expert at this.


This will open your file in UTF-8 but it won't change the encoding of the strings it contains.


Answer was in previous message.