python读取文件中乱码内容
首要矛盾不应该是避免写出乱码日志嘛~~~不过既然存在了,只能想办法解决
文件内容如下:
1 2 3 4 5 6 7 8 9 10
| 2020-06-04 09:30:21,011 - INFO - ====== setup ====== 2020-06-04 09:30:21,016 - INFO - 启动TC003_01_punchGPS 2020-06-04 09:30:35,119 - INFO - APP::TC003_01_punchGPS_ 2020-06-04 09:30:35,123 - INFO - APP::鎵撳紑鍔冲姩鍔涚鐞嗛〉闈 2020-06-04 09:30:48,885 - INFO - APP::鐧诲綍鍔冲姩鍔涚鐞咥PP_ 2020-06-04 09:31:51,872 - INFO - APP::杩涘叆鐧诲綍鍔冲姩鍔涚鐞嗚彍鍗昣 2020-06-04 09:31:57,813 - INFO - APP::verify:杩涘叆绉诲姩鎵撳崱_ 2020-06-04 09:32:01,860 - INFO - APP::verify:閫夋嫨鎵撳崱鏂瑰紡_ 2020-06-04 09:32:07,617 - INFO - APP::verify:鑾峰彇鎵撳崱鏃堕棿_ 2020-06-04 09:32:08,484 - INFO - APP::punchTime=09:32_
|
目标是截取到“punchTime”的值,并且去掉“_”
开始写法:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
| filename = 'E:/py_workstation/tmpfiles/WFM4_old.TC003_01_punchGPS_1_1.log'
with open(filename, encoding='utf-8') as f: for line in f.readlines(): if 'punchTime' in line: res = line print(res) else: pass
aim_time_index = res.find('punchTime')
aim_time = res[aim_time_index+10:].replace('_', '')
print(aim_time)
|
然后出现一堆你懂得报错:UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc6 in position 87: invalid continuation byte
查了一圈,解决办法两个:
1,直接工具,例如”notepad“转到utf-8
2,编码指定为:encoding=’unicode_escape’
完整代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
|
""" @author: kyle @time: 2020/6/4 10:17 """
filename = 'E:/py_workstation/tmpfiles/WFM4_old.TC003_01_punchGPS_1_1.log'
with open(filename, encoding='unicode_escape') as f: for line in f.readlines(): if 'punchTime' in line: res = line print(res) else: pass
aim_time_index = res.find('punchTime')
aim_time = res[aim_time_index+10:].replace('_', '')
print(aim_time)
|
另外,可以考虑两个包codecs
和chardet