close

[Solved] Python 3: os.walk() file paths UnicodeEncode: ‘utf-8’ codec can’t encode: surrogates not allowed

Hello Guys, How are you all? Hope You all Are Fine. Today I get the following error Python 3: os.walk() file paths UnicodeEncode: ‘utf-8’ codec can’t encode: surrogates not allowed in python. So Here I am Explain to you all the possible solutions here.

Without wasting your time, Let’s start This Article to Solve This Error.

How Python 3: os.walk() file paths UnicodeEncode: ‘utf-8’ codec can’t encode: surrogates not allowed Error Occurs?

Today I get the following error Python 3: os.walk() file paths UnicodeEncode: ‘utf-8’ codec can’t encode: surrogates not allowed in python.

How To Solve Python 3: os.walk() file paths UnicodeEncode: ‘utf-8’ codec can’t encode: surrogates not allowed Error ?

  1. How To Solve Python 3: os.walk() file paths UnicodeEncode: 'utf-8' codec can't encode: surrogates not allowed Error ?

    To Solve Python 3: os.walk() file paths UnicodeEncode: 'utf-8' codec can't encode: surrogates not allowed Error I ended up passing in a byte string to os.walk() which will apparently return byte strings instead of incorrect unicode strings

  2. Python 3: os.walk() file paths UnicodeEncode: 'utf-8' codec can't encode: surrogates not allowed

    To Solve Python 3: os.walk() file paths UnicodeEncode: 'utf-8' codec can't encode: surrogates not allowed Error I ended up passing in a byte string to os.walk() which will apparently return byte strings instead of incorrect unicode strings

Solution 1


On Linux, filenames are ‘just a bunch of bytes’, and are not necessarily encoded in a particular encoding. Python 3 tries to turn everything into Unicode strings. In doing so the developers came up with a scheme to translate byte strings to Unicode strings and back without loss, and without knowing the original encoding. They used partial surrogates to encode the ‘bad’ bytes, but the normal UTF8 encoder can’t handle them when printing to the terminal.

For example, here’s a non-UTF8 byte string:

>>> b'C\xc3N'.decode('utf8','surrogateescape')
'C\udcc3N'

It can be converted to and from Unicode without loss:

>>> b'C\xc3N'.decode('utf8','surrogateescape').encode('utf8','surrogateescape')
b'C\xc3N'

But it can’t be printed:

>>> print(b'C\xc3N'.decode('utf8','surrogateescape'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' in position 1: surrogates not allowed

You’ll have to figure out what you want to do with file names with non-default encodings. Perhaps just encoding them back to original bytes and decode them with unknown replacement. Use this for display but keep the original name to access the file.

>>> b'C\xc3N'.decode('utf8','replace')
C�N

os.walk can also take a byte string and will return byte strings instead of Unicode strings:

for p,d,f in os.walk(b'.'):

Then you can decode as you like.

Solution 2

I ended up passing in a byte string to os.walk() which will apparently return byte strings instead of incorrect unicode strings

for root, dirs, files in os.walk(b'.'):
    print(root)

Summery

It’s all About this issue. Hope all solution helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which solution worked for you? Thank You.

Also, Read