Tuesday, October 11, 2011

Mapping Unicode into it's best ASCII representation in Python

This was annoying me for quite some time so I decided to help out anyone that was trying to figure out how to convert a string like 'café' into 'cafe'

first make sure you have a unicode representation

s = u'café'

then run the following:

import unicodedata
unicode(unicodedata.normalize('NFD', s).encode('ascii', 'ignore'), 'utf-8')

what this does is decomposes the unicode string into components, keeps only the ascii characters, then converts it back to unicode.