close

[Solved] python requests.get() returns improperly decoded text instead of UTF-8?

Hello Guys, How are you all? Hope You all Are Fine. Today I get the following error python requests.get() returns improperly decoded text instead of UTF-8? in python. So Here I am Explain to you all the possible solutions here.

Without wasting your time, Let’s start This Article to Solve This Error.

How python requests.get() returns improperly decoded text instead of UTF-8 Error Occurs?

Today I get the following error python requests.get() returns improperly decoded text instead of UTF-8? in python.

How To Solve python requests.get() returns improperly decoded text instead of UTF-8 Error ?

  1. How To Solve python requests.get() returns improperly decoded text instead of UTF-8 Error ?

    To Solve python requests.get() returns improperly decoded text instead of UTF-8 Error Educated guesses (mentioned above) are probably just a check for Content-Type header as being sent by server (quite misleading use of educated imho).

  2. python requests.get() returns improperly decoded text instead of UTF-8?

    To Solve python requests.get() returns improperly decoded text instead of UTF-8 Error Educated guesses (mentioned above) are probably just a check for Content-Type header as being sent by server (quite misleading use of educated imho).

Solution 1

From requests documentation:

When you make a request, Requests makes educated guesses about the encoding of the response based on the HTTP headers. The text encoding guessed by Requests is used when you access r.text. You can find out what encoding Requests is using, and change it, using the r.encoding property.

>>> r.encoding
'utf-8'
>>> r.encoding = 'ISO-8859-1'

Check the encoding requests used for your page, and if it’s not the right one – try to force it to be the one you need.

Regarding the differences between requests and urllib.urlopen – they probably use different ways to guess the encoding. Thats all.

Solution 2

Educated guesses (mentioned above) are probably just a check for Content-Type header as being sent by server (quite misleading use of educated imho).

For response header Content-Type: text/html the result is ISO-8859-1 (default for HTML4), regardless any content analysis (ie. default for HTML5 is UTF-8).

For response header Content-Type: text/html; charset=utf-8 the result is UTF-8.

Luckily for us, requests uses chardet library and that usually works quite well (attribute requests.Response.apparent_encoding), so you usually want to do:

r = requests.get("https://martin.slouf.name/")
# override encoding by real educated guess as provided by chardet
r.encoding = r.apparent_encoding
# access the data
r.text

Summery

It’s all About this issue. Hope all solution helped you a lot. Comment below Your thoughts and your queries. Also, Comment below which solution worked for you? Thank You.

Also, Read