New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

转换时遇到字体名为中文（比如“宋体”）时，发生错误 #286

Open

hlhtddx opened this issue May 1, 2024 · 1 comment

hlhtddx commented May 1, 2024

如题，转换时遇到字体名为中文（比如“宋体”）时，发生错误
bytes must be in range[0 to 255]
错误点在
https://github.com/ArtifexSoftware/pdf2docx/blame/master/pdf2docx/common/share.py#L128
当字体名称为中文时，ord(c)大于255，转换成bytes时会报错

def decode(s:str):
    '''Try to decode a unicode string.'''
    b = bytes(ord(c) for c in s) ### 这里出错
    for encoding in ['utf-8', 'gbk', 'gb2312', 'iso-8859-1']:
        try:
            res = b.decode(encoding)
            break
        except:
            continue
    return res

Author

hlhtddx commented May 1, 2024

缺了一遍，只有在选择multiprocessing=True才会出现问题，单进程模式不会出问题

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment