KeyError: "There is no item named 'word/NULL' in the archive"
半年前写过一个小的 Python 脚本来帮助分析一些工作文档的内容,其中 .docx
文件是用 python-docx
来解析的,但今天碰到一个问题,直接在加载文件的时候就报错了,call stack 还涉及到了 zipfile
Traceback (most recent call last):
File "c:\util\", line 157, in parse_spec_file
doc = Document(filepath)
File "c:\util\venv\lib\site-packages\docx\", line 25, in Document
document_part =
File "c:\util\venv\lib\site-packages\docx\opc\", line 128, in open
pkg_reader = PackageReader.from_file(pkg_file)
File "c:\util\venv\lib\site-packages\docx\opc\", line 35, in from_file
sparts = PackageReader._load_serialized_parts(
File "c:\util\venv\lib\site-packages\docx\opc\", line 69, in _load_serialized_parts
for partname, blob, reltype, srels in part_walker:
File "c:\util\venv\lib\site-packages\docx\opc\", line 110, in _walk_phys_parts
for partname, blob, reltype, srels in next_walker:
File "c:\util\venv\lib\site-packages\docx\opc\", line 105, in _walk_phys_parts
blob = phys_reader.blob_for(partname)
File "c:\util\venv\lib\site-packages\docx\opc\", line 108, in blob_for
File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\", line 1475, in read
with, "r", pwd) as fp:
File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\", line 1514, in open
zinfo = self.getinfo(name)
File "C:\Users\xxx\AppData\Local\Programs\Python\Python38\lib\", line 1441, in getinfo
raise KeyError(
KeyError: "There is no item named 'word/NULL' in the archive"
在网上简单搜索了下,似乎没什么好用的招,也看到 CSDN 上有个曲线救国的方案,大致就是手工解压后,直接用 BeautifulSoup
对其中感兴趣的 xml
继续在网上搜索,看到 python-docx
The way I would fix it on a single file would be to extract the package using opc-diag, grep through the relationship files to find NULL with something like grep NULL *.rels, and then just delete the offending relationship line.
对于 .docx
文件,我除了知道它们其实都是压缩文件,也没有更多的了解了。。。但看了这段话,结合前面做过的实验,感觉应该还是文档的内容有问题,准备老老实实看下内容,很不幸这个文档还 300 多页。。。只好切换到打印预览模式下缩小了先大致看下,结果比较容易就看到问题了,有两张引用的图像不存在!在 Word 里直接把这两个坏引用删除掉,保存后再用脚本解析就没问题了!
后来还尝试着把文件用 7-Zip 打开,可以找到对应的问题点:
本来想提个 Issue,看到有现成的就直接回复了下,先看看官方对这类问题是否值得修复是什么态度吧:fix: accommodate NULL relationship (by skipping)。