2020-12-25

python3读取乱码文件

python读取文件中乱码内容

首要矛盾不应该是避免写出乱码日志嘛~~~不过既然存在了，只能想办法解决

文件内容如下：

2020-06-04 09:30:21,011 - INFO - ====== setup ======
2020-06-04 09:30:21,016 - INFO - 启动TC003_01_punchGPS
2020-06-04 09:30:35,119 - INFO - APP::TC003_01_punchGPS_
2020-06-04 09:30:35,123 - INFO - APP::鎵撳紑鍔冲姩鍔涚鐞嗛〉闈
2020-06-04 09:30:48,885 - INFO - APP::鐧诲綍鍔冲姩鍔涚鐞咥PP_
2020-06-04 09:31:51,872 - INFO - APP::杩涘叆鐧诲綍鍔冲姩鍔涚鐞嗚彍鍗昣
2020-06-04 09:31:57,813 - INFO - APP::verify:杩涘叆绉诲姩鎵撳崱_
2020-06-04 09:32:01,860 - INFO - APP::verify:閫夋嫨鎵撳崱鏂瑰紡_
2020-06-04 09:32:07,617 - INFO - APP::verify:鑾峰彇鎵撳崱鏃堕棿_
2020-06-04 09:32:08,484 - INFO - APP::punchTime=09:32_

目标是截取到“punchTime”的值，并且去掉“_”

开始写法：

filename = 'E:/py_workstation/tmpfiles/WFM4_old.TC003_01_punchGPS_1_1.log'

with open(filename, encoding='utf-8') as f:
    for line in f.readlines():
        if 'punchTime' in line:
            res = line
            print(res)
        else:
            pass

aim_time_index = res.find('punchTime')
# print(aim_time_index)

aim_time = res[aim_time_index+10:].replace('_', '')

print(aim_time)

然后出现一堆你懂得报错：UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 87: invalid continuation byte

查了一圈，解决办法两个：
1，直接工具，例如”notepad“转到utf-8
2，编码指定为:encoding=’unicode_escape’

完整代码：

#!/usr/bin/env python3
# -*-coding: utf-8 -*-
"""
@author: kyle
@time: 2020/6/4 10:17
"""

# filename = 'E:/py_workstation/tmpfiles/punchGPS_1_1.log'

filename = 'E:/py_workstation/tmpfiles/WFM4_old.TC003_01_punchGPS_1_1.log'

with open(filename, encoding='unicode_escape') as f:
    for line in f.readlines():
        if 'punchTime' in line:
            res = line
            print(res)
        else:
            pass

aim_time_index = res.find('punchTime')
# print(aim_time_index)

aim_time = res[aim_time_index+10:].replace('_', '')

print(aim_time)

另外，可以考虑两个包codecs和chardet

本文标题:python3读取乱码文件

文章作者:Kyle

发布时间:2020-12-25, 13:13:49

最后更新:2020-12-25, 13:15:21

原始链接:https://spindriftks.github.io/2020/12/25/python3%E8%AF%BB%E5%8F%96%E4%B9%B1%E7%A0%81%E6%96%87%E4%BB%B6/

许可协议: "署名-非商用-相同方式共享 4.0" 转载请保留原文链接及作者。