python3读取乱码文件

python读取文件中乱码内容

首要矛盾不应该是避免写出乱码日志嘛~~~不过既然存在了,只能想办法解决

文件内容如下:

1
2
3
4
5
6
7
8
9
10
2020-06-04 09:30:21,011 - INFO - ====== setup ======
2020-06-04 09:30:21,016 - INFO - 启动TC003_01_punchGPS
2020-06-04 09:30:35,119 - INFO - APP::TC003_01_punchGPS_
2020-06-04 09:30:35,123 - INFO - APP::鎵撳紑鍔冲姩鍔涚鐞嗛〉闈
2020-06-04 09:30:48,885 - INFO - APP::鐧诲綍鍔冲姩鍔涚鐞咥PP_
2020-06-04 09:31:51,872 - INFO - APP::杩涘叆鐧诲綍鍔冲姩鍔涚鐞嗚彍鍗昣
2020-06-04 09:31:57,813 - INFO - APP::verify:杩涘叆绉诲姩鎵撳崱_
2020-06-04 09:32:01,860 - INFO - APP::verify:閫夋嫨鎵撳崱鏂瑰紡_
2020-06-04 09:32:07,617 - INFO - APP::verify:鑾峰彇鎵撳崱鏃堕棿_
2020-06-04 09:32:08,484 - INFO - APP::punchTime=09:32_

目标是截取到“punchTime”的值,并且去掉“_”

开始写法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
filename = 'E:/py_workstation/tmpfiles/WFM4_old.TC003_01_punchGPS_1_1.log'

with open(filename, encoding='utf-8') as f:
for line in f.readlines():
if 'punchTime' in line:
res = line
print(res)
else:
pass

aim_time_index = res.find('punchTime')
# print(aim_time_index)

aim_time = res[aim_time_index+10:].replace('_', '')

print(aim_time)

然后出现一堆你懂得报错:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 87: invalid continuation byte

查了一圈,解决办法两个:
1,直接工具,例如”notepad“转到utf-8
2,编码指定为:encoding=’unicode_escape’

完整代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
#!/usr/bin/env python3
# -*-coding: utf-8 -*-
"""
@author: kyle
@time: 2020/6/4 10:17
"""

# filename = 'E:/py_workstation/tmpfiles/punchGPS_1_1.log'

filename = 'E:/py_workstation/tmpfiles/WFM4_old.TC003_01_punchGPS_1_1.log'

with open(filename, encoding='unicode_escape') as f:
for line in f.readlines():
if 'punchTime' in line:
res = line
print(res)
else:
pass

aim_time_index = res.find('punchTime')
# print(aim_time_index)

aim_time = res[aim_time_index+10:].replace('_', '')

print(aim_time)

另外,可以考虑两个包codecschardet

文章目录
|