普通视图

发现新文章,点击刷新页面。
昨天以前首页

Centos 7 安装PyMuPDF

作者 obaby
2024年8月21日 10:38

接引前文,昨天把代码写好测试 ok 之后,以为就万事大吉了。然而,今天往服务器上部署的时候,直接给整麻了。问题一个接一个,错误一堆接一堆。直接让人破防了。

对于 linux 的发行版,我并没有神马偏见,主要是用过的版本也不多,但是,不得不说那个 centos 是真烂,也不知道为啥那么多人喜欢用这个破系统。

直接 pip 安装,好嘛,这一堆错误:

[root@iZbp12k4fwg2euy5kkr9u7Z ~]# pip install PyMuPDF
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
Collecting PyMuPDF
  Using cached http://mirrors.cloud.aliyuncs.com/pypi/packages/9f/1d/032d24e0c774e67742395fda163a172c60e4d0f9875785d5199eb2956d5e/PyMuPDF-1.19.6.tar.gz (2.3 MB)
  Preparing metadata (setup.py) ... done
Using legacy 'setup.py install' for PyMuPDF, since package 'wheel' is not installed.
Installing collected packages: PyMuPDF
    Running setup.py install for PyMuPDF ... error
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/setup.py'"'"'; __file__='"'"'/tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-7fhkepkr/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.6m/PyMuPDF
         cwd: /tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/
    Complete output (20 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.6
    creating build/lib.linux-x86_64-3.6/fitz
    copying fitz/__init__.py -> build/lib.linux-x86_64-3.6/fitz
    copying fitz/fitz.py -> build/lib.linux-x86_64-3.6/fitz
    copying fitz/utils.py -> build/lib.linux-x86_64-3.6/fitz
    copying fitz/__main__.py -> build/lib.linux-x86_64-3.6/fitz
    running build_ext
    building 'fitz._fitz' extension
    creating build/temp.linux-x86_64-3.6
    creating build/temp.linux-x86_64-3.6/fitz
    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/mupdf -I/usr/local/include/mupdf -Imupdf/thirdparty/freetype/include -I/usr/include/freetype2 -I/usr/include/python3.6m -c fitz/fitz_wrap.c -o build/temp.linux-x86_64-3.6/fitz/fitz_wrap.o
    fitz/fitz_wrap.c:2755:18: fatal error: fitz.h: No such file or directory
     #include <fitz.h>
                      ^
    compilation terminated.
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/setup.py'"'"'; __file__='"'"'/tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-7fhkepkr/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.6m/PyMuPDF Check the logs for full command output.

按照提示看来是 gcc 报错了,错误原因是没有头文件,一通搜索:https://blog.csdn.net/u012140499/article/details/112798704 提供了解决思路,下载源码https://casper.mupdf.com/releases/安装。

直接下载最新版编译,又是一堆报错:

source/fitz/util.c: In function ‘fz_new_xhtml_document_from_document’:
source/fitz/util.c:866:2: warning: ‘new_doc’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  return new_doc;
  ^
    CC build/release/source/fitz/warp.o
    CC build/release/source/fitz/writer.o
source/fitz/writer.c: In function ‘fz_new_document_writer_with_buffer’:
source/fitz/writer.c:305:2: warning: ‘wri’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  return wri;
  ^
    CC build/release/source/fitz/xml.o
    CC build/release/source/fitz/xmltext-device.o
    CC build/release/source/fitz/zip.o
    CXX build/release/source/fitz/tessocr.o
/bin/sh: g++: command not found
make: *** [build/release/source/fitz/tessocr.o] Error 127

提示找不到 g++,嗯,再来解决 g++

yum search "gcc-c++"

就一个结果:

oaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.cloud.aliyuncs.com
 * extras: mirrors.cloud.aliyuncs.com
 * updates: mirrors.cloud.aliyuncs.com
======================================================================================================= N/S matched: gcc-c++ =======================================================================================================
gcc-c++.x86_64 : C++ support for GCC

  Name and summary matches only, use "search all" for everything.

安装 g++:

[root@iZbp12k4fwg2euy5kkr9u7Z mupdf-1.22.0-source]# yum install "gcc-c++.x86_64" -y
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.cloud.aliyuncs.com
 * extras: mirrors.cloud.aliyuncs.com
 * updates: mirrors.cloud.aliyuncs.com
Resolving Dependencies
--> Running transaction check
---> Package gcc-c++.x86_64 0:4.8.5-44.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

====================================================================================================================================================================================================================================
 Package                                                Arch                                                  Version                                                     Repository                                           Size
====================================================================================================================================================================================================================================
Installing:
 gcc-c++                                                x86_64                                                4.8.5-44.el7                                                base                                                7.2 M

Transaction Summary
====================================================================================================================================================================================================================================
Install  1 Package

Total download size: 7.2 M
Installed size: 16 M
Downloading packages:
gcc-c++-4.8.5-44.el7.x86_64.rpm                                                                                                                                                                              | 7.2 MB  00:00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : gcc-c++-4.8.5-44.el7.x86_64                                                                                                                                                                                      1/1 
  Verifying  : gcc-c++-4.8.5-44.el7.x86_64                                                                                                                                                                                      1/1 

Installed:
  gcc-c++.x86_64 0:4.8.5-44.el7                                                                                                                                                                                                     

Complete!

再来一遍:

make HAVE_X11=no HAVE_GLUT=no prefix=/usr/local install

编译安装命令参考这个链接:https://mupdf.readthedocs.io/en/latest/quick-start-guide.html#linux

几百行错误出来了:

thirdparty/harfbuzz/src/graph/../hb-meta.hh:76:41: note: in definition of macro ‘HB_AUTO_RETURN’
 #define HB_AUTO_RETURN(E) -> decltype ((E)) { return (E); }
                                         ^
In file included from thirdparty/harfbuzz/src/graph/pairpos-graph.hh:32:0,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:31,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27:
thirdparty/harfbuzz/src/graph/classdef-graph.hh: In constructor ‘graph::class_def_size_estimator_t::class_def_size_estimator_t(It)’:
thirdparty/harfbuzz/src/graph/classdef-graph.hh:155:44: error: ‘struct hb_hashmap_t<unsigned int, hb_set_t>’ has no member named ‘keys’
     for (unsigned klass : glyphs_per_class.keys ())
                                            ^
thirdparty/harfbuzz/src/graph/classdef-graph.hh: In member function ‘bool graph::class_def_size_estimator_t::in_error()’:
thirdparty/harfbuzz/src/graph/classdef-graph.hh:200:47: error: ‘struct hb_hashmap_t<unsigned int, hb_set_t>’ has no member named ‘values’
     for (const hb_set_t& s : glyphs_per_class.values ())
                                               ^
In file included from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27:0:
thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh: In member function ‘void graph::Lookup::fix_existing_subtable_links(graph::gsubgpos_graph_context_t&, unsigned int, hb_vector_t<hb_pair_t<unsigned int, hb_vector_t<unsigned int> > >&)’:
thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:259:28: error: ‘struct hb_serialize_context_t::object_t’ has no member named ‘all_links_writer’
       for (auto& l : v.obj.all_links_writer ())
                            ^
thirdparty/harfbuzz/src/graph/gsubgpos-context.cc: In member function ‘unsigned int graph::gsubgpos_graph_context_t::num_non_ext_subtables()’:
thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:62:25: error: ‘struct hb_hashmap_t<unsigned int, graph::Lookup*>’ has no member named ‘values’
   for (auto l : lookups.values ())
                         ^
In file included from thirdparty/harfbuzz/src/graph/../hb.hh:484:0,
                 from thirdparty/harfbuzz/src/graph/../hb-set.hh:31,
                 from thirdparty/harfbuzz/src/graph/graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27:
thirdparty/harfbuzz/src/graph/../hb-vector.hh: In instantiation of ‘Type hb_vector_t<Type, sorted>::pop() [with Type = hb_user_data_array_t::hb_user_data_item_t; bool sorted = false]’:
thirdparty/harfbuzz/src/graph/../hb-object.hh:127:7:   required from ‘void hb_lockable_set_t<item_t, lock_t>::fini(lock_t&) [with item_t = hb_user_data_array_t::hb_user_data_item_t; lock_t = hb_mutex_t]’
thirdparty/harfbuzz/src/graph/../hb-object.hh:185:34:   required from here
thirdparty/harfbuzz/src/graph/../hb-vector.hh:398:43: error: cannot convert ‘std::remove_reference<hb_user_data_array_t::hb_user_data_item_t&>::type {aka hb_user_data_array_t::hb_user_data_item_t}’ to ‘hb_user_data_key_t*’ in initialization
     Type v {std::move (arrayZ[length - 1])};
                                           ^
In file included from thirdparty/harfbuzz/src/graph/../hb.hh:481:0,
                 from thirdparty/harfbuzz/src/graph/../hb-set.hh:31,
                 from thirdparty/harfbuzz/src/graph/graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27:
thirdparty/harfbuzz/src/graph/../hb-iter.hh: In instantiation of ‘void hb_copy(S&&, D&&) [with S = const hb_hashmap_t<unsigned int, unsigned int, true>&; D = hb_hashmap_t<unsigned int, unsigned int, true>&]’:
thirdparty/harfbuzz/src/graph/../hb-map.hh:46:100:   required from ‘hb_hashmap_t<K, V, minus_one>::hb_hashmap_t(const hb_hashmap_t<K, V, minus_one>&) [with K = unsigned int; V = unsigned int; bool minus_one = true]’
thirdparty/harfbuzz/src/graph/../hb-map.hh:444:56:   required from here
thirdparty/harfbuzz/src/graph/../hb-iter.hh:1016:14: error: no match for call to ‘(const<anonymous struct>) (const hb_hashmap_t<unsigned int, unsigned int, true>&)’
   hb_iter (is) | hb_sink (id);
              ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:156:1: note: candidates are:
 {
 ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:158:3: note: template<class T> hb_iter_type<T><anonymous struct>::operator()(T&&) const
   operator () (T&& c) const
   ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:158:3: note:   template argument deduction/substitution failed:
thirdparty/harfbuzz/src/graph/../hb-iter.hh:164:3: note: template<class Type> hb_array_t<Type><anonymous struct>::operator()(Type*, unsigned int) const
   operator () (Type *array, unsigned int length) const
   ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:164:3: note:   template argument deduction/substitution failed:
thirdparty/harfbuzz/src/graph/../hb-iter.hh:1016:14: note:   mismatched types ‘Type*’ and ‘hb_hashmap_t<unsigned int, unsigned int, true>’
   hb_iter (is) | hb_sink (id);
              ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:168:3: note: template<class Type, unsigned int length> hb_array_t<Type><anonymous struct>::operator()(Type (&)[length]) const
   operator () (Type (&array)[length]) const
   ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:168:3: note:   template argument deduction/substitution failed:
thirdparty/harfbuzz/src/graph/../hb-iter.hh:1016:14: note:   mismatched types ‘Type [length]’ and ‘const hb_hashmap_t<unsigned int, unsigned int, true>’
   hb_iter (is) | hb_sink (id);
              ^
In file included from thirdparty/harfbuzz/src/graph/../hb-serialize.hh:36:0,
                 from thirdparty/harfbuzz/src/graph/../hb-machinery.hh:37,
                 from thirdparty/harfbuzz/src/graph/../hb-bit-set.hh:33,
                 from thirdparty/harfbuzz/src/graph/../hb-bit-set-invertible.hh:32,
                 from thirdparty/harfbuzz/src/graph/../hb-set.hh:32,
                 from thirdparty/harfbuzz/src/graph/graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27:
thirdparty/harfbuzz/src/graph/../hb-map.hh: In instantiation of ‘uint32_t hb_hashmap_t<K, V, minus_one>::hash() const [with K = unsigned int; V = unsigned int; bool minus_one = true; uint32_t = unsigned int]’:
thirdparty/harfbuzz/src/graph/../hb-algs.hh:237:43:   required from ‘constexpr hb_head_t<unsigned int, decltype (hb_deref(v).hash())><anonymous struct>::impl(const T&, hb_priority<1u>) const [with T = hb::shared_ptr<hb_map_t>; hb_head_t<unsigned int, decltype (hb_deref(v).hash())> = unsigned int]’
thirdparty/harfbuzz/src/graph/../hb-algs.hh:245:3:   required by substitution of ‘template<class T> constexpr hb_head_t<unsigned int, decltype (((const<anonymous struct>*)this)-><anonymous struct>::impl(v, hb_priority<16u>()))><anonymous struct>::operator()(const T&) const [with T = hb::shared_ptr<hb_map_t>]’
thirdparty/harfbuzz/src/graph/../hb-map.hh:257:50:   required from ‘bool hb_hashmap_t<K, V, minus_one>::has(K, VV**) const [with VV = unsigned int; K = hb::shared_ptr<hb_map_t>; V = unsigned int; bool minus_one = false]’
thirdparty/harfbuzz/src/graph/../hb-ot-layout-common.hh:3034:36:   required from here
thirdparty/harfbuzz/src/graph/../hb-map.hh:291:19: error: ‘iter_items’ was not declared in this scope
     + iter_items ()
                   ^
thirdparty/harfbuzz/src/graph/../hb-map.hh: In instantiation of ‘bool hb_hashmap_t<K, V, minus_one>::is_equal(const hb_hashmap_t<K, V, minus_one>&) const [with K = unsigned int; V = unsigned int; bool minus_one = true]’:
thirdparty/harfbuzz/src/graph/../hb-map.hh:306:78:   required from ‘bool hb_hashmap_t<K, V, minus_one>::operator==(const hb_hashmap_t<K, V, minus_one>&) const [with K = unsigned int; V = unsigned int; bool minus_one = true]’
thirdparty/harfbuzz/src/graph/../hb-map.hh:96:65:   required from ‘bool hb_hashmap_t<K, V, minus_one>::item_t::operator==(const K&) const [with K = hb::shared_ptr<hb_map_t>; V = unsigned int; bool minus_one = false]’
thirdparty/harfbuzz/src/graph/../hb-map.hh:258:33:   required from ‘bool hb_hashmap_t<K, V, minus_one>::has(K, VV**) const [with VV = unsigned int; K = hb::shared_ptr<hb_map_t>; V = unsigned int; bool minus_one = false]’
thirdparty/harfbuzz/src/graph/../hb-ot-layout-common.hh:3034:36:   required from here
thirdparty/harfbuzz/src/graph/../hb-map.hh:300:28: error: ‘iter’ was not declared in this scope
     for (auto pair : iter ())
                            ^
make: *** [build/release/thirdparty/harfbuzz/src/graph/gsubgpos-context.o] Error 1

尝试多个版本都会出现上面的错误,或者会提示不支持 c++17 标准,直接搜索错误多数解决方案都是升级 gcc 编译器,这尼玛,yum 不支持,源码安装又是一堆依赖,我升级,升级你妹。

尝试降级 mupdf 版本,终于经过多次尝试之后发现1.12 版本是可以安装的。

install -d /usr/local/include/mupdf
install -d /usr/local/include/mupdf/fitz
install -d /usr/local/include/mupdf/pdf
install include/mupdf/*.h /usr/local/include/mupdf
install include/mupdf/fitz/*.h /usr/local/include/mupdf/fitz
install include/mupdf/pdf/*.h /usr/local/include/mupdf/pdf
install -d /usr/local/lib
install build/release/libmupdf.a build/release/libmupdfthird.a /usr/local/lib
install -d /usr/local/bin
install build/release/mutool    build/release/muraster   build/release/mujstest build/release/mjsgen /usr/local/bin
install -d /usr/local/share/man/man1
install docs/man/*.1 /usr/local/share/man/man1
install -d /usr/local/share/doc/mupdf
install -d /usr/local/share/doc/mupdf/examples
install README COPYING CHANGES /usr/local/share/doc/mupdf
install docs/*.html docs/*.css docs/*.png /usr/local/share/doc/mupdf
install docs/examples/* /usr/local/share/doc/mupdf/examples

来继续 pip,来看着几千行的报错,尼玛,你要炸啊:

    fitz/fitz_wrap.c: In function ‘JM_rect_from_py’:
    fitz/fitz_wrap.c:4042:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘util_include_point_in_rect’:
    fitz/fitz_wrap.c:3447:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘util_transform_point’:
    fitz/fitz_wrap.c:3461:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘util_union_rect’:
    fitz/fitz_wrap.c:3468:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘util_concat_matrix’:
    fitz/fitz_wrap.c:3475:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘JM_matrix_from_py’:
    fitz/fitz_wrap.c:4131:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘JM_derotate_page_matrix’:
    fitz/fitz_wrap.c:5193:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘JM_irect_from_py’:
    fitz/fitz_wrap.c:4071:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-kfa4_6i0/pymupdf_d444a7b2e89d4aa38ac652587530e9a2/setup.py'"'"'; __file__='"'"'/tmp/pip-install-kfa4_6i0/pymupdf_d444a7b2e89d4aa38ac652587530e9a2/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-b8m2p6nm/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.6m/PyMuPDF Check the logs for full command output.

尝试降低版本:

[root@iZbp12k4fwg2euy5kkr9u7Z mupdf-1.12.0-source]# pip install PyMuPDF==1.12
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
ERROR: Could not find a version that satisfies the requirement PyMuPDF==1.12 (from versions: 1.11.2, 1.12.5, 1.13.20, 1.14.19.post2, 1.14.19.2, 1.14.20, 1.14.21, 1.16.0, 1.16.1, 1.16.2, 1.16.3, 1.16.4, 1.16.5, 1.16.6, 1.16.7, 1.16.8, 1.16.8.1, 1.16.9, 1.16.10, 1.16.11, 1.16.12, 1.16.13, 1.16.14, 1.16.15, 1.16.16, 1.16.17, 1.16.18, 1.17.0, 1.17.1, 1.17.2, 1.17.3, 1.17.4, 1.17.5, 1.17.6, 1.17.7, 1.18.0, 1.18.1, 1.18.2, 1.18.3, 1.18.4, 1.18.5, 1.18.6, 1.18.7, 1.18.8, 1.18.9, 1.18.10, 1.18.11, 1.18.12, 1.18.13, 1.18.14, 1.18.15, 1.18.16, 1.18.17, 1.18.18, 1.18.19, 1.19.0, 1.19.1, 1.19.2, 1.19.3, 1.19.4, 1.19.5, 1.19.6)
ERROR: No matching distribution found for PyMuPDF==1.12

提示没有 1.12,那就1.12.5:

[root@iZbp12k4fwg2euy5kkr9u7Z mupdf-1.12.0-source]# pip install PyMuPDF==1.12.5
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
Collecting PyMuPDF==1.12.5
  Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/c1/4a/f6424f019bbc3ac70b55fd589f6b3eb777e13d1a3600dbdb726575d5f5df/PyMuPDF-1.12.5-cp36-cp36m-manylinux1_x86_64.whl (3.4 MB)
     |████████████████████████████████| 3.4 MB 1.2 MB/s            
Installing collected packages: PyMuPDF
Successfully installed PyMuPDF-1.12.5

nice 终于装上了,启动服务,尝试进行文件拼接,直接报下面的错误:

'Document' object has no attribute 'new_page'

wtf,骇然不让人活了?

尝试升级版本:

[root@iZbp12k4fwg2euy5kkr9u7Z mupdf-1.12.0-source]# pip install PyMuPDF==1.18.19
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
Collecting PyMuPDF==1.18.19
  Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/d8/b6/59c001fa851ec4ad216232bc256b9aaff67ff9cf1c4bb542f68f1ad5fcd8/PyMuPDF-1.18.19-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.4 MB)
     |████████████████████████████████| 6.4 MB 1.4 MB/s            
Installing collected packages: PyMuPDF
  Attempting uninstall: PyMuPDF
    Found existing installation: PyMuPDF 1.12.5
    Uninstalling PyMuPDF-1.12.5:
      Successfully uninstalled PyMuPDF-1.12.5
Successfully installed PyMuPDF-1.18.19

世界终于清净了:

总结:

1. mupdf 源码安装选择mupdf-1.12.0 https://mupdf.com/downloads/archive/mupdf-1.20.0-source.tar.gz
2. pip 安装选择1.18.19 pip install PyMuPDF==1.18.19

后记:

刚才尝试将 centos 的 python 升级为 3.8.6 之后,pymupdf 貌似能正常安装新版本。这尼玛,系统自带的这一堆低版本垃圾:

Successfully installed Babel-2.14.0 Jinja2-3.1.3 MarkupSafe-2.1.5 PyMuPDF-1.24.9 PyMuPDFb-1.24.9 PyPDF2-3.0.1 Pygments-2.18.0 SecretStorage-3.3.3 SimpleWebSocketServer-0.1.2 aliyun-python-sdk-core-2.14.0 aliyun-python-sdk-imm-1.24.0 aliyun-python-sdk-kms-2.16.2 backports.tarfile-1.2.0 certifi-2024.2.2 cffi-1.17.0 charset-normalizer-3.3.2 ci-info-0.3.0 click-8.1.7 configobj-5.0.8 configparser-7.1.0 contourpy-1.1.1 crcmod-1.7 cryptography-42.0.4 cycler-0.12.1 docutils-0.20.1 docxcompose-1.4.0 docxtpl-0.16.7 etelemetry-0.3.1 filelock-3.15.4 fonttools-4.53.1 fsspec-2024.6.1 httplib2-0.22.0 idna-3.6 importlib-metadata-8.4.0 importlib-resources-6.4.3 isodate-0.6.1 jaraco.classes-3.4.0 jaraco.context-6.0.1 jaraco.functools-4.0.2 jeepney-0.8.0 jmespath-0.10.0 keyring-25.3.0 kiwisolver-1.4.5 looseversion-1.3.0 lxml-5.1.0 markdown-it-py-3.0.0 matplot-0.1.9 matplotlib-3.7.5 mdurl-0.1.2 more-itertools-10.4.0 mpmath-1.3.0 networkx-3.1 nh3-0.2.18 numpy-1.24.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.6.20 nvidia-nvtx-cu12-12.1.105 opencv-python-4.10.0.84 oss2-2.18.4 packaging-24.1 pandas-2.0.3 pathlib-1.0.1 pillow-10.4.0 pkginfo-1.11.1 pycparser-2.21 pycryptodome-3.20.0 pydot-3.0.1 pyloco-0.0.139 pyparsing-3.1.2 python-dateutil-2.9.0.post0 python-docx-1.1.0 pytz-2024.1 pyxnat-1.6.2 rdflib-6.3.2 readme-renderer-43.0 requests-2.32.3 requests-toolbelt-1.0.0 rfc3986-2.0.0 rich-13.7.1 scipy-1.10.1 simplejson-3.19.3 six-1.16.0 sympy-1.13.2 torch-2.4.0 traits-6.3.2 triton-3.0.0 twine-5.1.1 typing-3.7.4.3 typing-extensions-4.9.0 tzdata-2024.1 urllib3-2.2.2 ushlex-0.99.1 websocket-client-1.8.0 zipp-3.20.0

 

  •  

将多个图片合并为 PDF

作者 obaby
2024年8月20日 15:23

某个业务需要让用户下载文件盖章之后重新上传盖章版本,但是现在有个问题那就是操作基本都在手机端,通过手机端上传 pdf 的确是个问题。所以目前的方案是上传盖章版之后的图片。

然鹅,这个方法用户表示略微有点蛋疼,有的需要上传几十张图片,这些盖章的图片重新下载之后管理也是个问题。那个是哪个根本分不清楚,并且要想根据业务编号来管理盖章版文件也是个问题。

所以,就给出了一个方案,将上传的 图片重新转换为 pdf。

鉴于图片是放在 oss 上的,oss 本身倒是提供了图片转 pdf 的方法(https://help.aliyun.com/zh/imm/user-guide/convert-an-image-to-pdf):

# -*- coding: utf-8 -*-
# This file is auto-generated, don't edit it. Thanks.
import sys
import os
from typing import List

from alibabacloud_imm20200930.client import Client as imm20200930Client
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_imm20200930 import models as imm_20200930_models
from alibabacloud_tea_util import models as util_models
from alibabacloud_tea_util.client import Client as UtilClient


class Sample:
    def __init__(self):
        pass

    @staticmethod
    def create_client(
        access_key_id: str,
        access_key_secret: str,
    ) -> imm20200930Client:
        """
        使用AccessKey ID&AccessKey Secret初始化账号Client。
        @param access_key_id:
        @param access_key_secret:
        @return: Client
        @throws Exception
        """
        config = open_api_models.Config(
            access_key_id=access_key_id,
            access_key_secret=access_key_secret
        )
        # 填写访问的IMM域名。
        config.endpoint = f'imm.cn-zhangjiakou.aliyuncs.com'
        return imm20200930Client(config)

    @staticmethod
    def main(
        args: List[str],
    ) -> None:
        # 阿里云账号AccessKey拥有所有API的访问权限,建议您使用RAM用户进行API访问或日常运维。
        # 强烈建议不要把AccessKey ID和AccessKey Secret保存到工程代码里,否则可能导致AccessKey泄露,威胁您账号下所有资源的安全。
        # 本示例通过从环境变量中读取AccessKey,来实现API访问的身份验证。如何配置环境变量,请参见https://help.aliyun.com/document_detail/2361894.html。
        imm_access_key_id = os.getenv("AccessKeyId")
        imm_access_key_secret = os.getenv("AccessKeySecret")
        client = Sample.create_client(imm_access_key_id, imm_access_key_secret)
        sources_0 = imm_20200930_models.CreateImageToPDFTaskRequestSources(
            uri='oss://test-bucket/test-object.jpg'
        )
        create_image_to_pdftask_request = imm_20200930_models.CreateImageToPDFTaskRequest(
            project_name='test-project',
            target_uri='oss://test-bucket/test-target-object.pdf',
            sources=[
                sources_0
            ]
        )
        runtime = util_models.RuntimeOptions()
        try:
            # 复制代码运行请自行打印API的返回值。
            client.create_image_to_pdftask_with_options(create_image_to_pdftask_request, runtime)
        except Exception as error:
            # 如有需要,请打印错误信息。
            UtilClient.assert_as_string(error.message)

    @staticmethod
    async def main_async(
        args: List[str],
    ) -> None:
        # 阿里云账号AccessKey拥有所有API的访问权限,建议您使用RAM用户进行API访问或日常运维。
        # 强烈建议不要把AccessKey ID和AccessKey Secret保存到工程代码里,否则可能导致AccessKey泄露,威胁您账号下所有资源的安全。
        # 本示例通过从环境变量中读取AccessKey,来实现API访问的身份验证。如何配置环境变量,请参见https://help.aliyun.com/document_detail/2361894.html。
        imm_access_key_id = os.getenv("AccessKeyId")
        imm_access_key_secret = os.getenv("AccessKeySecret")
        client = Sample.create_client(imm_access_key_id, imm_access_key_secret)
        sources_0 = imm_20200930_models.CreateImageToPDFTaskRequestSources(
            uri='oss://test-bucket/test-object.jpg'
        )
        create_image_to_pdftask_request = imm_20200930_models.CreateImageToPDFTaskRequest(
            project_name='test-project',
            target_uri='oss://test-bucket/test-target-object.pdf',
            sources=[
                sources_0
            ]
        )
        runtime = util_models.RuntimeOptions()
        try:
            # 复制代码运行请自行打印API的返回值。
            await client.create_image_to_pdftask_with_options_async(create_image_to_pdftask_request, runtime)
        except Exception as error:
            # 如有需要,请打印错误信息。
            UtilClient.assert_as_string(error.message)


if __name__ == '__main__':
    Sample.main(sys.argv[1:])

然而,项目里面已经引入了比较旧的 aliyun 的 sdk。这个新的再引用之后就需要修改之前的代码,这也就蛋疼了。

网上搜了一下,代码不少,但是不好用啊,这尼玛,就没人写个靠谱的代码吗?

最终通过PyMuPDF来解决了这个问题:

import fitz  # PyMuPDF

# Open an existing PDF or create a new one
pdf_document = fitz.open()  # Creates a new PDF

# Define the image file path
image_path = "path/to/your/image.jpg"

# Get the dimensions of the image
img = fitz.open(image_path)
img_rect = img[0].rect  # Get the rectangle of the first page of the image

# Create a new page with the same dimensions as the image
pdf_page = pdf_document.new_page(width=img_rect.width, height=img_rect.height)

# Insert the image into the new page
pdf_page.insert_image(pdf_page.rect, filename=image_path)

# Save the PDF to a file
pdf_document.save("output.pdf")
pdf_document.close()

实际的业务代码:

def converImageToPdf(img_list):
    # pdf = fitz.open() # PyMuPDF
    pdf_document = fitz.open()  # Creates a new PDF

    for img_url in img_list:
        img_local_file = download_image(img_url,'confirmd_images')
        img = fitz.open(img_local_file)
        img_rect = img[0].rect  # Get the rectangle of the first page of the image

        # Create a new page with the same dimensions as the image
        pdf_page = pdf_document.new_page(width=img_rect.width, height=img_rect.height)

        # Insert the image into the new page
        pdf_page.insert_image(pdf_page.rect, filename=img_local_file)
        img.close()
    file_name = random_file_name('pdf')
    if not os.path.exists('confirmd_receipt'):
        os.mkdir('confirmd_receipt')
    pdf_document.save(os.path.join('confirmd_receipt/') + file_name)
    pdf_document.close()

实际效果:

依赖:

PyMuPDFb      ==      1.24.9

 

  •  
❌
❌