普通视图

发现新文章,点击刷新页面。
昨天以前obaby@mars

再见,朱丽叶

作者 obaby
2024年9月2日 11:08

爱屋及乌

最近这段时间每天都会晚上个把小时的黑神话,从最开始的毫不关心,到现在成了游戏搭子。宝子对西游记的故事也越来越感兴趣了。然而,这一切都得归功于那个小猪猪,也就是八戒。因为八戒这呆萌可爱的样子,宝子开始跟自己一起玩游戏,最开始就是坐在腿上看自己玩,后来叫她操控人物前进,跑、跳,直到昨天晚上有了筋斗云。

再往后就开始在战斗中做一些操作了,例如定身、分身,每次定身的时候,小猪猪都会说:“定的好”。她也会说,为啥小猪猪每次都这么说呢。

鉴于她喜欢上小猪猪,进而喜欢上了西游记,每次遇到一些怪都会大概给她讲讲故事是什么样子的,虽然很多自己也记不清楚了。周六的时候买了一本百回版的西游记,昨天晚上开始给她读的时候感觉有些文字描述过于古典,还怕她接受不了,实际读下来,加上自己的一些白话翻译,感觉接受度还蛮高的,昨天第一回读完了,结束的时候已经开始问了很多的问题:她的筋斗云是哪里来的,她的金箍棒又是哪里来的,怎么加入的取经队伍等等。之前虽然没有读过或者看过西游记的太多作品,但是对于故事中的人物已经有了一下大概的了解,有的是通过黑神话,有的是通过零星的涉猎。

所以,至此,《黑神话 悟空》我觉得是物超所值的。之前看到有评论举报黑神话红孩儿自杀部分给 9 岁孩子造成心理阴影,我还怕同样的事情会出现在宝子身上,事实证明自己多虑了,她只问了两个问题,为什么他死之前是这个样子的,并且越来越难看了;他为什么要自杀呢?太多的人喜欢人云亦云,喜欢完美,与我而言,一个东西能满足自己的部分需求,能带来物超所值的回报,这就够了。讲好一个故事,让宝子爱上古典文学,喜欢上猴子和小猪猪的故事,至此,200 多块钱的投入,远超自己的预期,尤其是对于营造孩子的兴趣来说。

 

再见,朱丽叶

 

现在才讲到文章的标题,昨晚睡得并不好,脑袋里的想法起起落落,无数的思想在缠斗,加上晚上喝水比较多。夜里连续每隔一个来小时就醒一次。其实说白了就是哪有什么选择困难,还不是因为穷。

之前说寻阿尔法罗密欧二手车,委托人一直没给消息,其实想想也明白,大概率是找不到的。周六上午想着既然车没消息,直接去看看新车吧。打车前往青岛的经销商地址,重庆南路。之前从网上询价之后,已经有个小姐姐加了自己微信。到了之后沟通倒是也算顺利,阿尔法罗密欧跟玛莎拉蒂在同一个院内,想也是同一个经销商。毕竟多年之前有过买玛莎拉蒂送阿尔法罗密欧的活动,然而,我也买不起玛莎拉蒂。

再看到实车之后,感觉尺寸比实际比从网上看到的小了不少,后排空间也比较紧张,作为一个一米七的小个子,坐进去头发都快能蹭到车顶了。

仪表盘虽然不大,但是启动的时候,两个车灯的效果还是蛮不错的。实际的驾驶体验也的确汹涌澎湃,毕竟自己没开过什么好车啊。

然而,价格还是让人比较抑郁的,33 万没有一分钱的优惠。并且鉴于车是在太小众了,后期的保养维护成本也是问题,虽然小姐姐说一个月能卖两三辆,但是我好长时间都没看到过这个车了。

BBA 虽然是不错,然而,自己这个定位目前也买不到什么高配置。只能搞个什么入门款,而新能源神马的就更不考虑了,毕竟,自己不是那个买新能源的人。主要是真的没什么强烈的欲望去买个神马冰箱彩电大沙发啊,毕竟多数时候都是自己一个人开,那干嘛要拉着房子里的家电到处跑呢?

中午从阿尔法罗密欧出来,直奔凯迪拉克,ct5虽然也是 33 的指导价,但是优惠完能到 19 万左右,所以预算 15 万买个二手朱丽叶,其实心里也多多少少有些犹豫,毕竟加几万块钱就能买到 ct5 的新车了。整体驾驶体验感觉也算 ok,不过那个车机系统是真的智障啊,销售小哥连续说了五次把座椅通风调到两档,都没识别,看来浴皇大帝,也只是对洗浴感兴趣吧,别的一概不理。

现在燃油车也开始弄一块大连屏,其实我倒是没那么喜欢那个大连屏,有不错,没有也可以。而朱丽叶的那个屏幕,中控屏只有 8 英寸,比自己的手机大不了多少,这个的确有点太小气了。

沿途不远的就是捷豹路虎,直奔捷豹而去。同样是 30 多万的指导价,也能做到 20 左右,这个价格瞬间觉得对朱丽叶没那么爱了(不是)。快两秒的百公里加速,真的值那多出来的十来万吗?这也是内心纠结的地方。喜欢依然是那么喜欢,但是价格却让然多少有些怂,真的很怂很怂。xel 的驾驶体验也算不错,还 ok,相比 ct5 感觉貌似少了座椅按摩?当然试驾的那台 ct5 是顶配。而至于语音助理神马的,自己没那么强的需求。

夜里的辗转反侧也正是因为如此,上午得空的功夫,自己又跑去捷豹路虎,把车定了。与其犹豫不决,不如快刀斩乱麻。

小六子也送去保养了,跟小六子说再见的时刻也快来临了。

对于朱丽叶的爱依然在,不过是因为穷罢了,未来,希望能有机会再拥有。

  •  

暴打渣男,耶✌️ — 《黑神话 悟空》

作者 obaby
2024年8月28日 09:39

起因源自于《黑神话 悟空》,狐狸与书生的故事:书生好心救了只被捕兽夹困住的小白狐,大雪天里领回家照顾。结果呢,晚上做梦了,白狐变美女,俩人谈恋爱,生儿育女,事业有成,幸福得跟花儿一样。但好景不长,白狐本性难移,梦里把全家都咬了。梦醒时分,书生一看那熟睡的白狐,善心转眼成恨意,直接给宰了,还做成围脖。

我原本还以为是个温情故事,结果真的尼玛也是个黑故事。这调性跟《格林童话》黑暗版有的一拼。

昨天晚上继续玩,到了小西天(如果没记错的话)。前进的路上看到了路边倒着的一具尸体,可以进行沟通,一番沟通下来,竟然是那个狐狸。本来看 cg 的时候 就觉得小狐狸死的莫名其妙,书生做了个梦,结果就把狐狸给宰了。得知那个书生现在就在偏殿,自然该去打屎他,不然真的是意难平。

但是啊,作为一个路痴感觉很绝望,在上山的过程中,误打误撞直接莫名其妙跳到了另外一个地图。这尼玛,渣男没打成,能忍?果断不能忍啊。想着重新加载存档,结果,一看存档只有一个最新的。

一番折腾找打了存档文件路径:

存档文件在 savegames:

下面是对应的按照小时、天的存档:

找到 26 号最后的存档复制替换现有存档:

重新加载游戏,这次直奔偏殿,连路上的 boss 都没打,直接略过,结果找了半天没找到,最后 tm 把大 boss 打完了,都没找到那个书生,这真实让人抑郁:

回滚存档,晚上继续,这个书生,我是打定了!

后续,打屎了。

 

  •  

秋风起

作者 obaby
2024年8月27日 14:50

连续两天的阴雨过后,中午出去买 monster 的时候竟然感受到了一丝丝的凉意,看来秋天真的是来了。

虽然已经有点亮了,但是还是想买条裙子,终于刚才还是下单了(下面是模特,不是我~~)。

之前写文章都是想写就写,不想写就不写,现在竟然有读者来催更了,嗯,就离谱,竟然有了一种天命打工人的感觉,工作得搞,文章得写。

其实这两天也确实没什么想写的内容,主要是生活波澜不惊,想记录都没得记录。另外一个是最近因为另外一个项目在忙碌:最近需要独立部署一套系统,但是该系统需要按年收取授权费用,就需要在原系统上增加功能,主要是需要进行授权管理以及校验。

前面的几篇文章其实都是为了这件事服务的,包括代码编译等等,毕竟代码一旦独立部署之后,基本就相当于人为刀俎我为鱼肉了,毕竟不能为了独立部署一套系统,把源代码直接全部给了,这买卖做的就非常的不划算了。

代码加密之后,另外一个功能就是授权管理,既然是服务器部署,那么就需要管理服务器的授权,本来想通过本地文件来管理的。但是本地授权有个问题就是如何更新授权,通过分发 license 文件来更新授权多多少少有些麻烦,于是最终决定采用服务器进行授权管理。所以,这几天又开发了一套授权管理系统。

至于本地授权管理策略,那就不写了,再写就暴露了太多的信息了。整体的目标就是,不暴露源码,并且不能轻易让修改或者绕过授权管理模块。当然,目前主要的保护策略是 py 编译为 so,如果要进一步处理,可能就得进行加壳处理了,多年不接触相关的东西,现在也不知道 linux 下好用的加密工具都有啥,当然,最简单直接 upx 压缩,倒也是个不错的选择,作为一个老牌压缩壳,在 linux 上的兼容性应该还是 ok 的,这个等后期实际部署的时候再测试吧。

不过有些等待,现在感觉似乎遥遥无期:

收获的季节,希望赶快到来吧。

  •  

PIP Chill–更精简的依赖包导出工具

作者 obaby
2024年8月26日 16:57

Make requirements with only the packages you need

项目导入的 module 越多,导出的依赖库就越多,尤其是很多系统自带的库一并给导出来来了。

pip freeze 导出效果:

asgiref==3.3.4
async-timeout==4.0.3
certifi==2021.5.30
chardet==4.0.0
coreapi==2.3.3
coreschema==0.0.4
Django==3.2.3
django-admin-lightweight-date-hierarchy==1.1.0
django-comment-migrate==0.1.5
django-cors-headers==3.10.1
django-crontab==0.7.1
django-export-xls==0.1.1
django-filter==21.1
django-ranged-response==0.2.0
django-redis==5.2.0
django-restql==0.15.4
django-simple-captcha==0.5.14
django-simpleui==2022.7.29
django-timezone-field==4.2.3
djangorestframework==3.12.4
djangorestframework-simplejwt==5.1.0
drf-yasg==1.20.0
et-xmlfile==1.1.0
idna==2.10
inflection==0.5.1
itypes==1.2.0
Jinja2==3.0.1
MarkupSafe==2.0.1
openpyxl==3.0.9
packaging==20.9
paho-mqtt==1.6.1
Pillow==8.3.1
PyJWT==2.1.0
PyMySQL==1.0.2
pyparsing==2.4.7
pyPEG2==2.15.2
pypinyin==0.46.0
pypng==0.20220715.0
pytz==2021.1
qrcode==7.4.2
redis==5.0.8
requests==2.25.1
ruamel.yaml==0.18.6
ruamel.yaml.clib==0.2.8
simplejson==3.18.4
six==1.16.0
smmap==4.0.0
sqlparse==0.4.1
typing-extensions==3.10.0.0
tzlocal==2.1
ua-parser==0.10.0
uritemplate==3.0.1
urllib3==1.26.6
user-agents==2.2.0
whitenoise==5.3.0
xlwt==1.3.0

pip-chill 导出效果:

django-admin-lightweight-date-hierarchy==1.1.0
django-comment-migrate==0.1.5
django-cors-headers==3.10.1
django-crontab==0.7.1
django-export-xls==0.1.1
django-filter==21.1
django-redis==5.2.0
django-restql==0.15.4
django-simple-captcha==0.5.14
django-simpleui==2022.7.29
django-timezone-field==4.2.3
djangorestframework-simplejwt==5.1.0
drf-yasg==1.20.0
encryptpy==1.0.5
openpyxl==3.0.9
paho-mqtt==1.6.1
pip-chill==1.0.3
pycryptodome==3.20.0
pymysql==1.0.2
pypinyin==0.46.0
qrcode==7.4.2
simplejson==3.18.4
tzlocal==2.1
user-agents==2.2.0
whitenoise==5.3.0

 

整体减掉了差不多一半多,同样在构建环境的时候也少了很多可能出问题的包,尤其是跨平台 install 的时候。

官方用法:

Suppose you have installed in your virtualenv a couple packages. When you run pip freeze, you'll get a list of all packages installed, with all dependencies. If one of the packages you installed ceases to depend on an already installed package, you have to manually remove it from the list. The list also makes no distinction about the packages you actually care about and packages your packages care about, making the requirements file bloated and, ultimately, inaccurate.

On your terminal, run:

$ pip-chill
bandit==1.7.0
bumpversion==0.6.0
click==7.1.2
coverage==5.3.1
flake8==3.8.4
nose==1.3.7
pip-chill==1.0.1
pytest==6.2.1
...
Or, if you want it without version numbers:

$ pip-chill --no-version
bandit
bumpversion
click
coverage
flake8
nose
pip-chill
pytest
...
Or, if you want it without pip-chill:

$ pip-chill --no-chill
bandit==1.7.0
bumpversion==0.6.0
click==7.1.2
coverage==5.3.1
flake8==3.8.4
nose==1.3.7
pytest==6.2.1
...
Or, if you want to list package dependencies too:

$ pip-chill -v
bandit==1.7.0
bumpversion==0.6.0
click==7.1.2
coverage==5.3.1
flake8==3.8.4
nose==1.3.7
pip-chill==1.0.1
pytest==6.2.1
sphinx==3.4.3
tox==3.21.1
twine==3.3.0
watchdog==1.0.2
# alabaster==0.7.12 # Installed as dependency for sphinx
# appdirs==1.4.4 # Installed as dependency for virtualenv
# attrs==20.3.0 # Installed as dependency for pytest
# babel==2.9.0 # Installed as dependency for sphinx

 

 

  •  

《黑神话:悟空》修改器+地图

作者 obaby
2024年8月24日 21:42

这几天一直在B站看游戏up主直播,不过看的是真累啊。一个黑熊打了三个小时,三个boss一共打了九个小时,后来为了看剧情,up主开挂了。哈哈哈。

其实我对于魂类游戏也不是很感冒,主要是手残,玩不动,这次主要是还是想看剧情,所以买了。网上的破解版,我试过一个了,是假的,几年前的试玩版本,可以不用尝试了。目前带着娃一块玩到第二回了。

鉴于自己手残,所以就上修改器了,哈哈哈。

游戏截图:

修改器截图:

修改器【风灵月影】(来源3dm,不是我自己开发的哈)下载:

温馨提示: 此处隐藏内容需要发表评论,并且审核通过后才能查看。
(发表评论请勾选 在此浏览器中保存我的显示名称、邮箱地址和网站地址,以便下次评论时使用。
(请仔细检查自己的昵称和评论内容,以免被识别为垃圾评论而导致无法正常审核。)

https://www.123pan.com/s/ucY7Vv-27dAA?提取码:A2JM

温馨提示: 此处隐藏内容需要发表评论,并且审核通过后才能查看。
(发表评论请勾选 在此浏览器中保存我的显示名称、邮箱地址和网站地址,以便下次评论时使用。
(请仔细检查自己的昵称和评论内容,以免被识别为垃圾评论而导致无法正常审核。)

黑神话地图:

https://www.gamersky.com/tools/map/wukong/?mapId=48

  •  

姐姐,你也不想让别人知道你的秘密吧? — 浅谈 Python 代码加密

作者 obaby
2024年8月23日 11:13

像 python 这种非编译型的语言,在代码加密上有这先天性的弱势,虽然java 之类的编译成 jar 依然比较容易反编译回来,但是毕竟也算是提升了那么一点点门槛,再加上混淆神马的,基本就能避免一些入门级的破解了。

但是对于 python 这种,如果发布不想直接让别人看代码,最简单的办法就是打包成二进制。通常的做法就是 py2exe.

官网地址:https://www.py2exe.org

py2exe

py2exe is a Python Distutils extension which converts Python scripts into executable Windows programs, able to run without requiring a Python installation.Spice

Development is hosted on GitHub. You can find the mailing listsvn, and downloads for Python 2 there. Downloads for Python 3 are on PyPI.

py2exe was originally developed by Thomas Heller who still makes contributions. Jimmy Retzlaff, Mark Hammond, and Alberto Sottile have also made contributions. Code contributions are always welcome from the community and many people provide invaluable help on the mailing list and the Wiki.

py2exe is used by BitTorrentSpamBayes, and thousands more – py2exe averages over 5,000 downloads per month.

In an effort to limit Wiki spam, this front page is not editable. Feel free to edit other pages with content relevant to py2exe. You will need an account to edit (but not to read) and your account information will be used only for the purposes of administering this Wiki.

The old py2exe web site is still available until that information has found its way into this wiki.

之前发布的各种美女爬虫基本都是通过 py2exe 打包的,虽然体积比较大,但是整体来说效果还算不错。

但是对于 web 框架,例如 flask django 之类的该怎么打包?这个就稍显麻烦一些了。

搜索一下,也能找到一些工具,例如 https://github.com/amchii/encryptpy 这个东西底层还是通过 cython 来实现的,如果不想使用这个工具,那么直接使用 cython 也是可以的,至于原理,本质上是直接把 py代码编译成了二进制文件。

下面直接用 cython 来实现:

pip install cython

编写编译脚本,叫什么无所谓,这里我的名称是cython_build.py:

from distutils.core import setup
from Cython.Build import cythonize

setup(
    ext_modules=cythonize(["application/settings.py",
                           "PowerManagement/models.py",
                           "PowerManagement/views/meter.py",
                           "PowerManagement/views/meter_remote.py",
                           "PowerManagement/views/substation_picture.py",
                           "PowerManagement/views/circuit.py",
                           ])
)

建议将上面的代码放在项目的根目录下,要处理的 modules 使用相对路径来实现。

通过下面的命令编译 py 文件:

python3 cython_build.py build_ext --inplace

但是上面的代码有个问题,那就是–inplace 并没有吧所有的 so文件放到原来的目录下,编译之后,一些文件放到了项目根目录下:

扩展名为 so 的文件就是编译生成的二进制文件,此时如果直接运行项目会提示各种组件找不到,还需要将处理后的文件复制到原来的目录下:

mv *.so PowerManagement/views/

最后一步就是删除原来的 py 文件:

cd "PowerManagement/views/"
rm  *.py

到这里整个编译流程就算完成了,可以尝试重新启动服务了。

毕竟姐姐,你也不想你的代码被人随便给抄走吧?

  •  

linux 如何定位进程二进制文件路径

作者 obaby
2024年8月22日 17:41

公司的服务器,每个人部署的环境都不一样,光一个 nginx 都能玩出花来,找半天找不到可执行文件在哪里。当然是在不行可以用 find 命令。

但是这个 find 效率太低了,得 tm 搜索半天。

ps xua | grep nginx

看下进程信息,这尼玛,./nginx 这个.表明是切换到目录下去运行的,搜索一下可能有历史记录吧,直接 history

history | grep nginx

nice,这一下都能看出来 nginx 是通过源码编译安装的了,真就是符合 centos 的风格啊。

那如果 history 没有 呢?

那就下一步,既然 linux 一切皆文件,那就直接去进程找呗 数字为进程 pid。

ls -la /proc/22935/exe

这样就找到 nginx 的二进制文件路径了:/usr/local/nginx/sbin/nginx。

为啥不用 which 命令呢,因为直接执行 nginx 运行不了,which 命令自然也无法定位文件:

  •  

PDF 进阶之印章识别

作者 obaby
2024年8月22日 15:22

说是pdf 印章识别,其实准确来说是图片印章识别。当然,这个功能还是要继续前面的话题。流程自动化,简言之就是需要在用户上传完盖章之后的所有文档图片之后将图片拼接为 pdf,并且,还要检测上传的图片是否已经盖章。之所以要自动检测是因为:看了下现在用户上传的图片,有很多并没盖章,企图蒙混过关。虽然后续还有审核功能,但是与其增加审核的工作量,不如直接在源头就卡死,如果没有盖章禁止结束流程。

github 上搜索印章识别也能搜到一些项目,但是,注意哈,我要说然鹅了。很多开源项目开源了一半,这就离谱,例如下面这个:

代码拉下来,哼哧哼哧部署好环境,结果在运行的时候提示 data 目录不存在,也就是说训练之后的权重文件没有,给的一堆没用的代码。

与近期放出,这个近期现在看已经近了四个月了,但是依然没放出,这就很棒。

找了另外一套代码:https://github.com/lian112233/OCR-seal

这套代码比上一套代码相对来说诚意多了一些,最起码公开了那个权重文件的下载链接,这个就是最大的进步了。

整个项目一共 3 个文件,看了下代码还以为是基于 tourch 实现的,后来发现里面集成了飞桨以及 yolov5 的相关功能,关键是没有给出虚拟环境要求。这就很麻烦,并且我只需要检测是否包含印章,对于印章文字不关注,所以也就没必要引入飞桨的 ocr 功能。

至于 yolov5,之前也写过几篇文章,感兴趣的可以自己搜,这里之所以不想自己训练了,主要还是懒。

https://github.com/ultralytics/yolov5

克隆三个文件代码之后,克隆 yolov5:

git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

到这里 yolov5 的依赖就 ok 了。

至于其他的环境依赖,参考下面的 requirement:

aliyun-python-sdk-core==2.14.0
aliyun-python-sdk-imm==1.24.0
aliyun-python-sdk-kms==2.16.2
Babel==2.14.0
backports.tarfile==1.2.0
certifi==2024.2.2
cffi==1.16.0
charset-normalizer==3.3.2
ci-info==0.3.0
click==8.1.7
configobj==5.0.8
configparser==7.1.0
contourpy==1.2.1
crcmod==1.7
cryptography==42.0.4
cycler==0.12.1
docutils==0.21.2
docxcompose==1.4.0
docxtpl==0.16.7
etelemetry==0.3.1
filelock==3.15.4
fonttools==4.53.1
fsspec==2024.6.1
gitdb==4.0.11
GitPython==3.1.43
httplib2==0.22.0
idna==3.6
importlib_metadata==8.4.0
importlib_resources==6.4.3
isodate==0.6.1
jaraco.classes==3.4.0
jaraco.context==6.0.1
jaraco.functools==4.0.2
Jinja2==3.1.3
jmespath==0.10.0
keyring==25.3.0
kiwisolver==1.4.5
looseversion==1.3.0
lxml==5.1.0
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplot==0.1.9
matplotlib==3.9.2
mdurl==0.1.2
more-itertools==10.4.0
mpmath==1.3.0
networkx==3.2.1
nh3==0.2.18
nibabel==5.2.1
nipype==1.8.6
numpy==1.26.4
opencv-python==4.10.0.84
oss2==2.18.4
packaging==24.1
pandas==2.2.2
pathlib==1.0.1
pillow==10.4.0
pkginfo==1.10.0
prov==2.0.1
psutil==6.0.0
py-cpuinfo==9.0.0
pycparser==2.21
pycryptodome==3.20.0
pydot==3.0.1
Pygments==2.18.0
pyloco==0.0.139
PyMuPDF==1.24.9
PyMuPDFb==1.24.9
pyparsing==3.1.2
PyPDF2==3.0.1
python-dateutil==2.9.0.post0
python-docx==1.1.0
pytz==2024.1
pyxnat==1.6.2
PyYAML==6.0.2
rdflib==6.3.2
readme_renderer==44.0
requests==2.31.0
requests-toolbelt==1.0.0
rfc3986==2.0.0
rich==13.7.1
scipy==1.13.1
seaborn==0.13.2
simplejson==3.19.3
SimpleWebSocketServer==0.1.2
six==1.16.0
smmap==5.0.1
sympy==1.13.2
thop==0.1.1.post2209072238
torch==2.4.0
torchvision==0.19.0
tqdm==4.66.5
traits==6.3.2
twine==5.1.1
typing==3.7.4.3
typing_extensions==4.9.0
tzdata==2024.1
ultralytics==8.2.79
ultralytics-thop==2.0.5
urllib3==2.2.1
ushlex==0.99.1
websocket-client==1.8.0
zipp==3.20.0

而至于检测部分,也没必要那么复杂,直接新写个方法:

model = torch.hub.load(repo, 'custom', path=model_path,
                           source='local')  # local repo


def predict(source='train',
        repo=repo,
        img_size=640):
    files = []

    if os.path.isdir(source):
        files = sorted([os.path.join(source, x) for x in os.listdir(source)])  # dir
    elif os.path.isfile(source):
        files = [source]

    images = [x for x in files if x.split('.')[-1].lower() in IMG_FORMATS]

    for path in images:
        print("Current pic: " + path)
        img = resize_img(cv2.imread(path), img_size)
        img_name = path.split('/')[-1].split('.')[0]
        result = model(img)
        result_pd = result.pandas()

        xywh = result_pd.xywh[0]
        xyxy = result_pd.xyxy[0]
        # print(result.pandas)
        print('result=', result)
        print(result_pd.names)
        print('xy=', xyxy)
        print('count=',len(xyxy))

实际检测效果:

/Users/zhongling/PycharmProjects/djangoProject/LaoshanReport/venv/bin/Python /Users/zhongling/PycharmProjects/djangoProject/LaoshanReport/stamp_detection.py 
/Users/zhongling/PycharmProjects/djangoProject/LaoshanReport/venv/lib/python3.9/site-packages/urllib3/__init__.py:35: NotOpenSSLWarning: urllib3 v2 only supports OpenSSL 1.1.1+, currently the 'ssl' module is compiled with 'LibreSSL 2.8.3'. See: https://github.com/urllib3/urllib3/issues/3020
  warnings.warn(
YOLOv5 🚀 v7.0-356-g2070b303 Python-3.9.6 torch-2.4.0 CPU

Fusing layers... 
YOLOv5m summary: 308 layers, 21037638 parameters, 0 gradients
Adding AutoShape... 
/Users/zhongling/PycharmProjects/djangoProject/LaoshanReport/yolov5/models/common.py:869: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with amp.autocast(autocast):
Current pic: /Users/zhongling/PycharmProjects/djangoProject/LaoshanReport/yolov5/test/20240103-182329.jpeg
result= image 1/1: 640x1137 (no detections)
Speed: 1.8ms pre-process, 153.8ms inference, 0.2ms NMS per image at shape (1, 3, 384, 640)
{0: 'stamp'}
count= 0
Current pic: /Users/zhongling/PycharmProjects/djangoProject/LaoshanReport/yolov5/test/WechatIMG5.jpg
/Users/zhongling/PycharmProjects/djangoProject/LaoshanReport/yolov5/models/common.py:869: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with amp.autocast(autocast):
result= image 1/1: 640x905 1 stamp
Speed: 1.8ms pre-process, 202.5ms inference, 0.6ms NMS per image at shape (1, 3, 480, 640)
{0: 'stamp'}
count= 1

 

  •  

Centos 7 安装PyMuPDF

作者 obaby
2024年8月21日 10:38

接引前文,昨天把代码写好测试 ok 之后,以为就万事大吉了。然而,今天往服务器上部署的时候,直接给整麻了。问题一个接一个,错误一堆接一堆。直接让人破防了。

对于 linux 的发行版,我并没有神马偏见,主要是用过的版本也不多,但是,不得不说那个 centos 是真烂,也不知道为啥那么多人喜欢用这个破系统。

直接 pip 安装,好嘛,这一堆错误:

[root@iZbp12k4fwg2euy5kkr9u7Z ~]# pip install PyMuPDF
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
Collecting PyMuPDF
  Using cached http://mirrors.cloud.aliyuncs.com/pypi/packages/9f/1d/032d24e0c774e67742395fda163a172c60e4d0f9875785d5199eb2956d5e/PyMuPDF-1.19.6.tar.gz (2.3 MB)
  Preparing metadata (setup.py) ... done
Using legacy 'setup.py install' for PyMuPDF, since package 'wheel' is not installed.
Installing collected packages: PyMuPDF
    Running setup.py install for PyMuPDF ... error
    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/setup.py'"'"'; __file__='"'"'/tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-7fhkepkr/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.6m/PyMuPDF
         cwd: /tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/
    Complete output (20 lines):
    running install
    running build
    running build_py
    creating build
    creating build/lib.linux-x86_64-3.6
    creating build/lib.linux-x86_64-3.6/fitz
    copying fitz/__init__.py -> build/lib.linux-x86_64-3.6/fitz
    copying fitz/fitz.py -> build/lib.linux-x86_64-3.6/fitz
    copying fitz/utils.py -> build/lib.linux-x86_64-3.6/fitz
    copying fitz/__main__.py -> build/lib.linux-x86_64-3.6/fitz
    running build_ext
    building 'fitz._fitz' extension
    creating build/temp.linux-x86_64-3.6
    creating build/temp.linux-x86_64-3.6/fitz
    gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -I/usr/include/mupdf -I/usr/local/include/mupdf -Imupdf/thirdparty/freetype/include -I/usr/include/freetype2 -I/usr/include/python3.6m -c fitz/fitz_wrap.c -o build/temp.linux-x86_64-3.6/fitz/fitz_wrap.o
    fitz/fitz_wrap.c:2755:18: fatal error: fitz.h: No such file or directory
     #include <fitz.h>
                      ^
    compilation terminated.
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/setup.py'"'"'; __file__='"'"'/tmp/pip-install-8aqc9a1k/pymupdf_d5ebd12caf9445ab82d6a5af68229d72/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-7fhkepkr/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.6m/PyMuPDF Check the logs for full command output.

按照提示看来是 gcc 报错了,错误原因是没有头文件,一通搜索:https://blog.csdn.net/u012140499/article/details/112798704 提供了解决思路,下载源码https://casper.mupdf.com/releases/安装。

直接下载最新版编译,又是一堆报错:

source/fitz/util.c: In function ‘fz_new_xhtml_document_from_document’:
source/fitz/util.c:866:2: warning: ‘new_doc’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  return new_doc;
  ^
    CC build/release/source/fitz/warp.o
    CC build/release/source/fitz/writer.o
source/fitz/writer.c: In function ‘fz_new_document_writer_with_buffer’:
source/fitz/writer.c:305:2: warning: ‘wri’ may be used uninitialized in this function [-Wmaybe-uninitialized]
  return wri;
  ^
    CC build/release/source/fitz/xml.o
    CC build/release/source/fitz/xmltext-device.o
    CC build/release/source/fitz/zip.o
    CXX build/release/source/fitz/tessocr.o
/bin/sh: g++: command not found
make: *** [build/release/source/fitz/tessocr.o] Error 127

提示找不到 g++,嗯,再来解决 g++

yum search "gcc-c++"

就一个结果:

oaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.cloud.aliyuncs.com
 * extras: mirrors.cloud.aliyuncs.com
 * updates: mirrors.cloud.aliyuncs.com
======================================================================================================= N/S matched: gcc-c++ =======================================================================================================
gcc-c++.x86_64 : C++ support for GCC

  Name and summary matches only, use "search all" for everything.

安装 g++:

[root@iZbp12k4fwg2euy5kkr9u7Z mupdf-1.22.0-source]# yum install "gcc-c++.x86_64" -y
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.cloud.aliyuncs.com
 * extras: mirrors.cloud.aliyuncs.com
 * updates: mirrors.cloud.aliyuncs.com
Resolving Dependencies
--> Running transaction check
---> Package gcc-c++.x86_64 0:4.8.5-44.el7 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

====================================================================================================================================================================================================================================
 Package                                                Arch                                                  Version                                                     Repository                                           Size
====================================================================================================================================================================================================================================
Installing:
 gcc-c++                                                x86_64                                                4.8.5-44.el7                                                base                                                7.2 M

Transaction Summary
====================================================================================================================================================================================================================================
Install  1 Package

Total download size: 7.2 M
Installed size: 16 M
Downloading packages:
gcc-c++-4.8.5-44.el7.x86_64.rpm                                                                                                                                                                              | 7.2 MB  00:00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : gcc-c++-4.8.5-44.el7.x86_64                                                                                                                                                                                      1/1 
  Verifying  : gcc-c++-4.8.5-44.el7.x86_64                                                                                                                                                                                      1/1 

Installed:
  gcc-c++.x86_64 0:4.8.5-44.el7                                                                                                                                                                                                     

Complete!

再来一遍:

make HAVE_X11=no HAVE_GLUT=no prefix=/usr/local install

编译安装命令参考这个链接:https://mupdf.readthedocs.io/en/latest/quick-start-guide.html#linux

几百行错误出来了:

thirdparty/harfbuzz/src/graph/../hb-meta.hh:76:41: note: in definition of macro ‘HB_AUTO_RETURN’
 #define HB_AUTO_RETURN(E) -> decltype ((E)) { return (E); }
                                         ^
In file included from thirdparty/harfbuzz/src/graph/pairpos-graph.hh:32:0,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:31,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27:
thirdparty/harfbuzz/src/graph/classdef-graph.hh: In constructor ‘graph::class_def_size_estimator_t::class_def_size_estimator_t(It)’:
thirdparty/harfbuzz/src/graph/classdef-graph.hh:155:44: error: ‘struct hb_hashmap_t<unsigned int, hb_set_t>’ has no member named ‘keys’
     for (unsigned klass : glyphs_per_class.keys ())
                                            ^
thirdparty/harfbuzz/src/graph/classdef-graph.hh: In member function ‘bool graph::class_def_size_estimator_t::in_error()’:
thirdparty/harfbuzz/src/graph/classdef-graph.hh:200:47: error: ‘struct hb_hashmap_t<unsigned int, hb_set_t>’ has no member named ‘values’
     for (const hb_set_t& s : glyphs_per_class.values ())
                                               ^
In file included from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27:0:
thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh: In member function ‘void graph::Lookup::fix_existing_subtable_links(graph::gsubgpos_graph_context_t&, unsigned int, hb_vector_t<hb_pair_t<unsigned int, hb_vector_t<unsigned int> > >&)’:
thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:259:28: error: ‘struct hb_serialize_context_t::object_t’ has no member named ‘all_links_writer’
       for (auto& l : v.obj.all_links_writer ())
                            ^
thirdparty/harfbuzz/src/graph/gsubgpos-context.cc: In member function ‘unsigned int graph::gsubgpos_graph_context_t::num_non_ext_subtables()’:
thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:62:25: error: ‘struct hb_hashmap_t<unsigned int, graph::Lookup*>’ has no member named ‘values’
   for (auto l : lookups.values ())
                         ^
In file included from thirdparty/harfbuzz/src/graph/../hb.hh:484:0,
                 from thirdparty/harfbuzz/src/graph/../hb-set.hh:31,
                 from thirdparty/harfbuzz/src/graph/graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27:
thirdparty/harfbuzz/src/graph/../hb-vector.hh: In instantiation of ‘Type hb_vector_t<Type, sorted>::pop() [with Type = hb_user_data_array_t::hb_user_data_item_t; bool sorted = false]’:
thirdparty/harfbuzz/src/graph/../hb-object.hh:127:7:   required from ‘void hb_lockable_set_t<item_t, lock_t>::fini(lock_t&) [with item_t = hb_user_data_array_t::hb_user_data_item_t; lock_t = hb_mutex_t]’
thirdparty/harfbuzz/src/graph/../hb-object.hh:185:34:   required from here
thirdparty/harfbuzz/src/graph/../hb-vector.hh:398:43: error: cannot convert ‘std::remove_reference<hb_user_data_array_t::hb_user_data_item_t&>::type {aka hb_user_data_array_t::hb_user_data_item_t}’ to ‘hb_user_data_key_t*’ in initialization
     Type v {std::move (arrayZ[length - 1])};
                                           ^
In file included from thirdparty/harfbuzz/src/graph/../hb.hh:481:0,
                 from thirdparty/harfbuzz/src/graph/../hb-set.hh:31,
                 from thirdparty/harfbuzz/src/graph/graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27:
thirdparty/harfbuzz/src/graph/../hb-iter.hh: In instantiation of ‘void hb_copy(S&&, D&&) [with S = const hb_hashmap_t<unsigned int, unsigned int, true>&; D = hb_hashmap_t<unsigned int, unsigned int, true>&]’:
thirdparty/harfbuzz/src/graph/../hb-map.hh:46:100:   required from ‘hb_hashmap_t<K, V, minus_one>::hb_hashmap_t(const hb_hashmap_t<K, V, minus_one>&) [with K = unsigned int; V = unsigned int; bool minus_one = true]’
thirdparty/harfbuzz/src/graph/../hb-map.hh:444:56:   required from here
thirdparty/harfbuzz/src/graph/../hb-iter.hh:1016:14: error: no match for call to ‘(const<anonymous struct>) (const hb_hashmap_t<unsigned int, unsigned int, true>&)’
   hb_iter (is) | hb_sink (id);
              ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:156:1: note: candidates are:
 {
 ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:158:3: note: template<class T> hb_iter_type<T><anonymous struct>::operator()(T&&) const
   operator () (T&& c) const
   ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:158:3: note:   template argument deduction/substitution failed:
thirdparty/harfbuzz/src/graph/../hb-iter.hh:164:3: note: template<class Type> hb_array_t<Type><anonymous struct>::operator()(Type*, unsigned int) const
   operator () (Type *array, unsigned int length) const
   ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:164:3: note:   template argument deduction/substitution failed:
thirdparty/harfbuzz/src/graph/../hb-iter.hh:1016:14: note:   mismatched types ‘Type*’ and ‘hb_hashmap_t<unsigned int, unsigned int, true>’
   hb_iter (is) | hb_sink (id);
              ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:168:3: note: template<class Type, unsigned int length> hb_array_t<Type><anonymous struct>::operator()(Type (&)[length]) const
   operator () (Type (&array)[length]) const
   ^
thirdparty/harfbuzz/src/graph/../hb-iter.hh:168:3: note:   template argument deduction/substitution failed:
thirdparty/harfbuzz/src/graph/../hb-iter.hh:1016:14: note:   mismatched types ‘Type [length]’ and ‘const hb_hashmap_t<unsigned int, unsigned int, true>’
   hb_iter (is) | hb_sink (id);
              ^
In file included from thirdparty/harfbuzz/src/graph/../hb-serialize.hh:36:0,
                 from thirdparty/harfbuzz/src/graph/../hb-machinery.hh:37,
                 from thirdparty/harfbuzz/src/graph/../hb-bit-set.hh:33,
                 from thirdparty/harfbuzz/src/graph/../hb-bit-set-invertible.hh:32,
                 from thirdparty/harfbuzz/src/graph/../hb-set.hh:32,
                 from thirdparty/harfbuzz/src/graph/graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-graph.hh:27,
                 from thirdparty/harfbuzz/src/graph/gsubgpos-context.cc:27:
thirdparty/harfbuzz/src/graph/../hb-map.hh: In instantiation of ‘uint32_t hb_hashmap_t<K, V, minus_one>::hash() const [with K = unsigned int; V = unsigned int; bool minus_one = true; uint32_t = unsigned int]’:
thirdparty/harfbuzz/src/graph/../hb-algs.hh:237:43:   required from ‘constexpr hb_head_t<unsigned int, decltype (hb_deref(v).hash())><anonymous struct>::impl(const T&, hb_priority<1u>) const [with T = hb::shared_ptr<hb_map_t>; hb_head_t<unsigned int, decltype (hb_deref(v).hash())> = unsigned int]’
thirdparty/harfbuzz/src/graph/../hb-algs.hh:245:3:   required by substitution of ‘template<class T> constexpr hb_head_t<unsigned int, decltype (((const<anonymous struct>*)this)-><anonymous struct>::impl(v, hb_priority<16u>()))><anonymous struct>::operator()(const T&) const [with T = hb::shared_ptr<hb_map_t>]’
thirdparty/harfbuzz/src/graph/../hb-map.hh:257:50:   required from ‘bool hb_hashmap_t<K, V, minus_one>::has(K, VV**) const [with VV = unsigned int; K = hb::shared_ptr<hb_map_t>; V = unsigned int; bool minus_one = false]’
thirdparty/harfbuzz/src/graph/../hb-ot-layout-common.hh:3034:36:   required from here
thirdparty/harfbuzz/src/graph/../hb-map.hh:291:19: error: ‘iter_items’ was not declared in this scope
     + iter_items ()
                   ^
thirdparty/harfbuzz/src/graph/../hb-map.hh: In instantiation of ‘bool hb_hashmap_t<K, V, minus_one>::is_equal(const hb_hashmap_t<K, V, minus_one>&) const [with K = unsigned int; V = unsigned int; bool minus_one = true]’:
thirdparty/harfbuzz/src/graph/../hb-map.hh:306:78:   required from ‘bool hb_hashmap_t<K, V, minus_one>::operator==(const hb_hashmap_t<K, V, minus_one>&) const [with K = unsigned int; V = unsigned int; bool minus_one = true]’
thirdparty/harfbuzz/src/graph/../hb-map.hh:96:65:   required from ‘bool hb_hashmap_t<K, V, minus_one>::item_t::operator==(const K&) const [with K = hb::shared_ptr<hb_map_t>; V = unsigned int; bool minus_one = false]’
thirdparty/harfbuzz/src/graph/../hb-map.hh:258:33:   required from ‘bool hb_hashmap_t<K, V, minus_one>::has(K, VV**) const [with VV = unsigned int; K = hb::shared_ptr<hb_map_t>; V = unsigned int; bool minus_one = false]’
thirdparty/harfbuzz/src/graph/../hb-ot-layout-common.hh:3034:36:   required from here
thirdparty/harfbuzz/src/graph/../hb-map.hh:300:28: error: ‘iter’ was not declared in this scope
     for (auto pair : iter ())
                            ^
make: *** [build/release/thirdparty/harfbuzz/src/graph/gsubgpos-context.o] Error 1

尝试多个版本都会出现上面的错误,或者会提示不支持 c++17 标准,直接搜索错误多数解决方案都是升级 gcc 编译器,这尼玛,yum 不支持,源码安装又是一堆依赖,我升级,升级你妹。

尝试降级 mupdf 版本,终于经过多次尝试之后发现1.12 版本是可以安装的。

install -d /usr/local/include/mupdf
install -d /usr/local/include/mupdf/fitz
install -d /usr/local/include/mupdf/pdf
install include/mupdf/*.h /usr/local/include/mupdf
install include/mupdf/fitz/*.h /usr/local/include/mupdf/fitz
install include/mupdf/pdf/*.h /usr/local/include/mupdf/pdf
install -d /usr/local/lib
install build/release/libmupdf.a build/release/libmupdfthird.a /usr/local/lib
install -d /usr/local/bin
install build/release/mutool    build/release/muraster   build/release/mujstest build/release/mjsgen /usr/local/bin
install -d /usr/local/share/man/man1
install docs/man/*.1 /usr/local/share/man/man1
install -d /usr/local/share/doc/mupdf
install -d /usr/local/share/doc/mupdf/examples
install README COPYING CHANGES /usr/local/share/doc/mupdf
install docs/*.html docs/*.css docs/*.png /usr/local/share/doc/mupdf
install docs/examples/* /usr/local/share/doc/mupdf/examples

来继续 pip,来看着几千行的报错,尼玛,你要炸啊:

    fitz/fitz_wrap.c: In function ‘JM_rect_from_py’:
    fitz/fitz_wrap.c:4042:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘util_include_point_in_rect’:
    fitz/fitz_wrap.c:3447:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘util_transform_point’:
    fitz/fitz_wrap.c:3461:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘util_union_rect’:
    fitz/fitz_wrap.c:3468:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘util_concat_matrix’:
    fitz/fitz_wrap.c:3475:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘JM_matrix_from_py’:
    fitz/fitz_wrap.c:4131:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘JM_derotate_page_matrix’:
    fitz/fitz_wrap.c:5193:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    fitz/fitz_wrap.c: In function ‘JM_irect_from_py’:
    fitz/fitz_wrap.c:4071:1: warning: control reaches end of non-void function [-Wreturn-type]
     }
     ^
    error: command 'gcc' failed with exit status 1
    ----------------------------------------
ERROR: Command errored out with exit status 1: /usr/bin/python3 -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-kfa4_6i0/pymupdf_d444a7b2e89d4aa38ac652587530e9a2/setup.py'"'"'; __file__='"'"'/tmp/pip-install-kfa4_6i0/pymupdf_d444a7b2e89d4aa38ac652587530e9a2/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-b8m2p6nm/install-record.txt --single-version-externally-managed --compile --install-headers /usr/local/include/python3.6m/PyMuPDF Check the logs for full command output.

尝试降低版本:

[root@iZbp12k4fwg2euy5kkr9u7Z mupdf-1.12.0-source]# pip install PyMuPDF==1.12
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
ERROR: Could not find a version that satisfies the requirement PyMuPDF==1.12 (from versions: 1.11.2, 1.12.5, 1.13.20, 1.14.19.post2, 1.14.19.2, 1.14.20, 1.14.21, 1.16.0, 1.16.1, 1.16.2, 1.16.3, 1.16.4, 1.16.5, 1.16.6, 1.16.7, 1.16.8, 1.16.8.1, 1.16.9, 1.16.10, 1.16.11, 1.16.12, 1.16.13, 1.16.14, 1.16.15, 1.16.16, 1.16.17, 1.16.18, 1.17.0, 1.17.1, 1.17.2, 1.17.3, 1.17.4, 1.17.5, 1.17.6, 1.17.7, 1.18.0, 1.18.1, 1.18.2, 1.18.3, 1.18.4, 1.18.5, 1.18.6, 1.18.7, 1.18.8, 1.18.9, 1.18.10, 1.18.11, 1.18.12, 1.18.13, 1.18.14, 1.18.15, 1.18.16, 1.18.17, 1.18.18, 1.18.19, 1.19.0, 1.19.1, 1.19.2, 1.19.3, 1.19.4, 1.19.5, 1.19.6)
ERROR: No matching distribution found for PyMuPDF==1.12

提示没有 1.12,那就1.12.5:

[root@iZbp12k4fwg2euy5kkr9u7Z mupdf-1.12.0-source]# pip install PyMuPDF==1.12.5
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
Collecting PyMuPDF==1.12.5
  Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/c1/4a/f6424f019bbc3ac70b55fd589f6b3eb777e13d1a3600dbdb726575d5f5df/PyMuPDF-1.12.5-cp36-cp36m-manylinux1_x86_64.whl (3.4 MB)
     |████████████████████████████████| 3.4 MB 1.2 MB/s            
Installing collected packages: PyMuPDF
Successfully installed PyMuPDF-1.12.5

nice 终于装上了,启动服务,尝试进行文件拼接,直接报下面的错误:

'Document' object has no attribute 'new_page'

wtf,骇然不让人活了?

尝试升级版本:

[root@iZbp12k4fwg2euy5kkr9u7Z mupdf-1.12.0-source]# pip install PyMuPDF==1.18.19
Looking in indexes: http://mirrors.cloud.aliyuncs.com/pypi/simple/
Collecting PyMuPDF==1.18.19
  Downloading http://mirrors.cloud.aliyuncs.com/pypi/packages/d8/b6/59c001fa851ec4ad216232bc256b9aaff67ff9cf1c4bb542f68f1ad5fcd8/PyMuPDF-1.18.19-cp36-cp36m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.4 MB)
     |████████████████████████████████| 6.4 MB 1.4 MB/s            
Installing collected packages: PyMuPDF
  Attempting uninstall: PyMuPDF
    Found existing installation: PyMuPDF 1.12.5
    Uninstalling PyMuPDF-1.12.5:
      Successfully uninstalled PyMuPDF-1.12.5
Successfully installed PyMuPDF-1.18.19

世界终于清净了:

总结:

1. mupdf 源码安装选择mupdf-1.12.0 https://mupdf.com/downloads/archive/mupdf-1.20.0-source.tar.gz
2. pip 安装选择1.18.19 pip install PyMuPDF==1.18.19

后记:

刚才尝试将 centos 的 python 升级为 3.8.6 之后,pymupdf 貌似能正常安装新版本。这尼玛,系统自带的这一堆低版本垃圾:

Successfully installed Babel-2.14.0 Jinja2-3.1.3 MarkupSafe-2.1.5 PyMuPDF-1.24.9 PyMuPDFb-1.24.9 PyPDF2-3.0.1 Pygments-2.18.0 SecretStorage-3.3.3 SimpleWebSocketServer-0.1.2 aliyun-python-sdk-core-2.14.0 aliyun-python-sdk-imm-1.24.0 aliyun-python-sdk-kms-2.16.2 backports.tarfile-1.2.0 certifi-2024.2.2 cffi-1.17.0 charset-normalizer-3.3.2 ci-info-0.3.0 click-8.1.7 configobj-5.0.8 configparser-7.1.0 contourpy-1.1.1 crcmod-1.7 cryptography-42.0.4 cycler-0.12.1 docutils-0.20.1 docxcompose-1.4.0 docxtpl-0.16.7 etelemetry-0.3.1 filelock-3.15.4 fonttools-4.53.1 fsspec-2024.6.1 httplib2-0.22.0 idna-3.6 importlib-metadata-8.4.0 importlib-resources-6.4.3 isodate-0.6.1 jaraco.classes-3.4.0 jaraco.context-6.0.1 jaraco.functools-4.0.2 jeepney-0.8.0 jmespath-0.10.0 keyring-25.3.0 kiwisolver-1.4.5 looseversion-1.3.0 lxml-5.1.0 markdown-it-py-3.0.0 matplot-0.1.9 matplotlib-3.7.5 mdurl-0.1.2 more-itertools-10.4.0 mpmath-1.3.0 networkx-3.1 nh3-0.2.18 numpy-1.24.4 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.6.20 nvidia-nvtx-cu12-12.1.105 opencv-python-4.10.0.84 oss2-2.18.4 packaging-24.1 pandas-2.0.3 pathlib-1.0.1 pillow-10.4.0 pkginfo-1.11.1 pycparser-2.21 pycryptodome-3.20.0 pydot-3.0.1 pyloco-0.0.139 pyparsing-3.1.2 python-dateutil-2.9.0.post0 python-docx-1.1.0 pytz-2024.1 pyxnat-1.6.2 rdflib-6.3.2 readme-renderer-43.0 requests-2.32.3 requests-toolbelt-1.0.0 rfc3986-2.0.0 rich-13.7.1 scipy-1.10.1 simplejson-3.19.3 six-1.16.0 sympy-1.13.2 torch-2.4.0 traits-6.3.2 triton-3.0.0 twine-5.1.1 typing-3.7.4.3 typing-extensions-4.9.0 tzdata-2024.1 urllib3-2.2.2 ushlex-0.99.1 websocket-client-1.8.0 zipp-3.20.0

 

  •  

将多个图片合并为 PDF

作者 obaby
2024年8月20日 15:23

某个业务需要让用户下载文件盖章之后重新上传盖章版本,但是现在有个问题那就是操作基本都在手机端,通过手机端上传 pdf 的确是个问题。所以目前的方案是上传盖章版之后的图片。

然鹅,这个方法用户表示略微有点蛋疼,有的需要上传几十张图片,这些盖章的图片重新下载之后管理也是个问题。那个是哪个根本分不清楚,并且要想根据业务编号来管理盖章版文件也是个问题。

所以,就给出了一个方案,将上传的 图片重新转换为 pdf。

鉴于图片是放在 oss 上的,oss 本身倒是提供了图片转 pdf 的方法(https://help.aliyun.com/zh/imm/user-guide/convert-an-image-to-pdf):

# -*- coding: utf-8 -*-
# This file is auto-generated, don't edit it. Thanks.
import sys
import os
from typing import List

from alibabacloud_imm20200930.client import Client as imm20200930Client
from alibabacloud_tea_openapi import models as open_api_models
from alibabacloud_imm20200930 import models as imm_20200930_models
from alibabacloud_tea_util import models as util_models
from alibabacloud_tea_util.client import Client as UtilClient


class Sample:
    def __init__(self):
        pass

    @staticmethod
    def create_client(
        access_key_id: str,
        access_key_secret: str,
    ) -> imm20200930Client:
        """
        使用AccessKey ID&AccessKey Secret初始化账号Client。
        @param access_key_id:
        @param access_key_secret:
        @return: Client
        @throws Exception
        """
        config = open_api_models.Config(
            access_key_id=access_key_id,
            access_key_secret=access_key_secret
        )
        # 填写访问的IMM域名。
        config.endpoint = f'imm.cn-zhangjiakou.aliyuncs.com'
        return imm20200930Client(config)

    @staticmethod
    def main(
        args: List[str],
    ) -> None:
        # 阿里云账号AccessKey拥有所有API的访问权限,建议您使用RAM用户进行API访问或日常运维。
        # 强烈建议不要把AccessKey ID和AccessKey Secret保存到工程代码里,否则可能导致AccessKey泄露,威胁您账号下所有资源的安全。
        # 本示例通过从环境变量中读取AccessKey,来实现API访问的身份验证。如何配置环境变量,请参见https://help.aliyun.com/document_detail/2361894.html。
        imm_access_key_id = os.getenv("AccessKeyId")
        imm_access_key_secret = os.getenv("AccessKeySecret")
        client = Sample.create_client(imm_access_key_id, imm_access_key_secret)
        sources_0 = imm_20200930_models.CreateImageToPDFTaskRequestSources(
            uri='oss://test-bucket/test-object.jpg'
        )
        create_image_to_pdftask_request = imm_20200930_models.CreateImageToPDFTaskRequest(
            project_name='test-project',
            target_uri='oss://test-bucket/test-target-object.pdf',
            sources=[
                sources_0
            ]
        )
        runtime = util_models.RuntimeOptions()
        try:
            # 复制代码运行请自行打印API的返回值。
            client.create_image_to_pdftask_with_options(create_image_to_pdftask_request, runtime)
        except Exception as error:
            # 如有需要,请打印错误信息。
            UtilClient.assert_as_string(error.message)

    @staticmethod
    async def main_async(
        args: List[str],
    ) -> None:
        # 阿里云账号AccessKey拥有所有API的访问权限,建议您使用RAM用户进行API访问或日常运维。
        # 强烈建议不要把AccessKey ID和AccessKey Secret保存到工程代码里,否则可能导致AccessKey泄露,威胁您账号下所有资源的安全。
        # 本示例通过从环境变量中读取AccessKey,来实现API访问的身份验证。如何配置环境变量,请参见https://help.aliyun.com/document_detail/2361894.html。
        imm_access_key_id = os.getenv("AccessKeyId")
        imm_access_key_secret = os.getenv("AccessKeySecret")
        client = Sample.create_client(imm_access_key_id, imm_access_key_secret)
        sources_0 = imm_20200930_models.CreateImageToPDFTaskRequestSources(
            uri='oss://test-bucket/test-object.jpg'
        )
        create_image_to_pdftask_request = imm_20200930_models.CreateImageToPDFTaskRequest(
            project_name='test-project',
            target_uri='oss://test-bucket/test-target-object.pdf',
            sources=[
                sources_0
            ]
        )
        runtime = util_models.RuntimeOptions()
        try:
            # 复制代码运行请自行打印API的返回值。
            await client.create_image_to_pdftask_with_options_async(create_image_to_pdftask_request, runtime)
        except Exception as error:
            # 如有需要,请打印错误信息。
            UtilClient.assert_as_string(error.message)


if __name__ == '__main__':
    Sample.main(sys.argv[1:])

然而,项目里面已经引入了比较旧的 aliyun 的 sdk。这个新的再引用之后就需要修改之前的代码,这也就蛋疼了。

网上搜了一下,代码不少,但是不好用啊,这尼玛,就没人写个靠谱的代码吗?

最终通过PyMuPDF来解决了这个问题:

import fitz  # PyMuPDF

# Open an existing PDF or create a new one
pdf_document = fitz.open()  # Creates a new PDF

# Define the image file path
image_path = "path/to/your/image.jpg"

# Get the dimensions of the image
img = fitz.open(image_path)
img_rect = img[0].rect  # Get the rectangle of the first page of the image

# Create a new page with the same dimensions as the image
pdf_page = pdf_document.new_page(width=img_rect.width, height=img_rect.height)

# Insert the image into the new page
pdf_page.insert_image(pdf_page.rect, filename=image_path)

# Save the PDF to a file
pdf_document.save("output.pdf")
pdf_document.close()

实际的业务代码:

def converImageToPdf(img_list):
    # pdf = fitz.open() # PyMuPDF
    pdf_document = fitz.open()  # Creates a new PDF

    for img_url in img_list:
        img_local_file = download_image(img_url,'confirmd_images')
        img = fitz.open(img_local_file)
        img_rect = img[0].rect  # Get the rectangle of the first page of the image

        # Create a new page with the same dimensions as the image
        pdf_page = pdf_document.new_page(width=img_rect.width, height=img_rect.height)

        # Insert the image into the new page
        pdf_page.insert_image(pdf_page.rect, filename=img_local_file)
        img.close()
    file_name = random_file_name('pdf')
    if not os.path.exists('confirmd_receipt'):
        os.mkdir('confirmd_receipt')
    pdf_document.save(os.path.join('confirmd_receipt/') + file_name)
    pdf_document.close()

实际效果:

依赖:

PyMuPDFb      ==      1.24.9

 

  •  
❌
❌