[toc]
python3编码
背景
requests请求时出现以下异常:
UnicodeEncodeError: 'latin-1' codec can't encode characters in position 0-1: Body ('你好') is not valid Latin-1. Use body.encode('utf-8') if you want to send it encoded in UTF-8.
原因
- requests中默认使用Latin-1编码传输数据,在发送请求前会encode('latin-1')
- if isinstance(body, str):
- # RFC 2616 Section 3.7.1 says that text default has a
- # default charset of iso-8859-1.
- body = _encode(body, 'body')
-
- def _encode(data, name='data'):
- """Call data.encode("latin-1") but show a better error message."""
- try:
- return data.encode("latin-1")
- except UnicodeEncodeError as err:
- raise UnicodeEncodeError(
- err.encoding,
- err.object,
- err.start,
- err.end,
- "%s (%.20r) is not valid Latin-1. Use %s.encode('utf-8') "
- "if you want to send it encoded in UTF-8." %
- (name.title(), data[err.start:err.end], name)) from None
-
- 数据传输格式Json
unicode的是能够直接编码成latin-1格式的,但是如果其中含中文则无法编码
- json.dumps的ensure_ascii
该参数指的是如果含非ascii则保留原样
- If ``ensure_ascii`` is false, then the return value can contain non-ASCIIcharacters if they appear in strings contained in ``obj``. Otherwise, allsuch characters are escaped in JSON strings.
-
- json.dumps('你好')
- Out[26]: '"\\u4f60\\u597d"'
- json.dumps('你好', ensure_ascii=False)
- Out[27]: '"你好"'
-
所以含中文a编码结果如下
- # false
- json.dumps(a, ensure_ascii=False).encode('latin-1')
- UnicodeEncodeError: 'latin-1' codec can't encode characters in position 1-2: ordinal not in range(256)
-
- # true
- json.dumps(a).encode('latin-1')
- b'"\\u4f60\\u597d"'
-
结论
- 使用json格式化时不要随意使用ensure_ascii
- requests请求时body默认使用latin-1编码