20230808在WIN10下使用python3将TXT文件转换为DOCX

20230808在WIN10下使用python3将TXT文件转换为DOCX
2023/8/8 19:30

缘起，由于google的文档翻译不支持SRT/TXT格式的字幕，因此需要将SRT格式的字幕转为DOCX。
Ch4.Unreported.World.2022.Mexicos.Psychedelic.Toads.1080p.HDTV.x265.AAC.MVGroup.org.mkv

1、ANSI编码的TXT文件转DOCX：
Ch4.Unreported.World.2022.Mexicos.Psychedelic.Toads.1080p.HDTV.x265.AAC.MVGroup.org_track3_eng.srt
直接使用记事本另存为ANSI编码的：ansi.txt
完成之后可以确认的！

2、
python docx utf8 读写

https://deepinout.com/python/python-qa/t_how-to-read-and-write-unicode-utf-8-files-in-python.html
如何在Python中读写Unicode（UTF-8）文件？

如何在Python中读写Unicode（UTF-8）文件？
Unicode是一种字符编码标准，用于表示各种语言的字符。UTF-8是Unicode编码的一种实现方式，由于它的兼容性和可读性比较优秀，现在已经成为了互联网上的常用编码方式。因此，在Python中读写Unicode（UTF-8）文件是非常重要的，接下来我们就来介绍如何操作。

如何读取Unicode（UTF-8）文件
在Python中，我们可以使用open函数打开文件，然后通过read()方法来读取数据。UTF-8的编码方式需要加上参数encoding="UTF-8"，代码如下：

with open('file.txt', 'r', encoding="UTF-8") as f:
data = f.read()

with语句可以更加安全地打开文件，即使发生异常也会自动关闭文件。读取到的数据会保存在data中。如果我们想分行读取数据，可以使用readlines()方法，这个方法返回一个列表，列表中包含文件的所有行。

with open('file.txt', 'r', encoding="UTF-8") as f:
lines = f.readlines()

要注意的是，当读取包含多个字节的Unicode字符时，需要注意读取的字节数。

3、
I:\Downloads\2005[红眼航班]Red Eye[BT下载迅雷下载]-云下载\Red.Eye.2005.2160p.BluRay.REMUX.HEVC.DTS-HD.MA.5.1-FGT-52.77GB\UTF8>python
Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import docx
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'docx'
>>> exit()

I:\Downloads\2005[红眼航班]Red Eye[BT下载迅雷下载]-云下载\Red.Eye.2005.2160p.BluRay.REMUX.HEVC.DTS-HD.MA.5.1-FGT-52.77GB\UTF8>pip install python-docx
Collecting python-docx
Downloading python-docx-0.8.11.tar.gz (5.6 MB)
---------------------------------------- 5.6/5.6 MB 858.3 kB/s eta 0:00:00
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting lxml>=2.3.2 (from python-docx)
Downloading lxml-4.9.3-cp39-cp39-win_amd64.whl (3.9 MB)
---------------------------------------- 3.9/3.9 MB 316.3 kB/s eta 0:00:00
Building wheels for collected packages: python-docx
Building wheel for python-docx (pyproject.toml) ... done
Created wheel for python-docx: filename=python_docx-0.8.11-py3-none-any.whl size=184516 sha256=cfcdeb6d53a59e9d49a21d93f77a3979e9a6a2f37748a1417dcc93c8fbc5640d
Stored in directory: c:\users\administrator\appdata\local\pip\cache\wheels\83\8b\7c\09ae60c42c7ba4ed2dddaf2b8b9186cb105255856d6ed3dba5
Successfully built python-docx
Installing collected packages: lxml, python-docx
Successfully installed lxml-4.9.3 python-docx-0.8.11

[notice] A new release of pip is available: 23.1.2 -> 23.2.1
[notice] To update, run: python.exe -m pip install --upgrade pip

I:\Downloads\2005[红眼航班]Red Eye[BT下载迅雷下载]-云下载\Red.Eye.2005.2160p.BluRay.REMUX.HEVC.DTS-HD.MA.5.1-FGT-52.77GB\UTF8>
I:\Downloads\2005[红眼航班]Red Eye[BT下载迅雷下载]-云下载\Red.Eye.2005.2160p.BluRay.REMUX.HEVC.DTS-HD.MA.5.1-FGT-52.77GB\UTF8>python
Python 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

import docx

with open('utf8.txt', 'r', encoding="UTF-8") as f:
data = f.read()

docment = docx.Document()
docment.add_paragraph(data)
docment.save('utf8.docx')

4、将同目录的全部SRT格式的字幕转换为DOCX文档，以便通过chrome上传到google翻译成为简体中文！

srt2docx.py

# coding=utf-8
import os

import docx

# 获取当前目录
path = os.getcwd()
# 查看当前目录下所有文件
files = os.listdir(path)

# 遍历所有文件
for file in files:
# 判断文件是否为 txt 文件
#if file.endswith('.txt'):
if file.endswith('.srt'):
# 构建新的文件名
#new_file = file.replace('.txt', '.json')
#new_file = file.replace('.txt', '.srt')
#new_file = file.replace('.txt', '.cn.srt')
new_file = file.replace('.srt', '.docx')
# 重命名文件
#os.rename(os.path.join(path, file), os.path.join(path, new_file))


f2=open(new_file,"wb")


#f_path=r'C:\Users\Admin\Desktop\shapenetcore_partanno_segmentation_benchmark_v0_normal_2\00000001\0.txt'
#f_path=r'1.txt'
#f_path=file

temp = 1
xuhao = 1;

#with open(f_path) as f:
#with open(file) as f:
#with open('utf8.txt', 'r', encoding="UTF-8") as f:
with open(file, 'r', encoding="UTF-8") as f:
#lines = f.readlines()
data = f.read()

docment = docx.Document()
docment.add_paragraph(data)
#docment.save('utf8.docx')
docment.save(new_file)

参考资料：

！！！！
https://wenku.csdn.net/answer/2cd4b739868911ed9a8700163e34985e
将txt文本转换成doc格式文件,直接写python代码

可以使用 python-docx 库来将 txt 文件转换为 docx 格式文件。
首先，需要安装 python-docx 库：
pip install python-docx

然后，可以使用以下代码将 txt 文件转换为 docx 格式文件：
import docx
# 读取 txt 文件with open('input.txt', 'r') as f:
text = f.read()
# 创建一个新的 docx 文档
document = docx.Document()
# 将 txt 文本写入文档
document.add_paragraph(text)
# 保存文档
document.save('output.docx')
上面的代码将会读取名为 input.txt 的 txt 文件，并将其内容写入一个新的 docx 文档中，最后保存为 output.docx 文件。

import docx

with open('ansi.txt', 'r') as f:
text = f.read()

#docment = docx.Docment()
docment = docx.Document()

docment.add_paragraph(text)
docment.save('ansi.docx')

【注意，如果 txt 文件中包含中文字符，可能会出现乱码问题。这时可以使用 codecs 库来解决这个问题：】
【20230807在WIN10下使用python3将TXT文件转换为DOCX（在UTF8编码下转换为DOCX有多一行的瑕疵）】

http://huijobs.cn/article/article-detail/11600/
python实现txt转word
2022年11月17日 23:06

def TXTRead_Writeline():
ms=open("./file/test.txt") #读取文件
for line in ms.readlines(): #逐行写入
with open("./file/test.doc","a") as mon:
mon.write(line)
TXTRead_Writeline()

LOG：
Type "help", "copyright", "credits" or "license" for more information.
>>> import docx
>>> ms=open("ansi.txt")
>>> print(ms)
<_io.TextIOWrapper name='ansi.txt' mode='r' encoding='cp936'>
>>> for line in ms.readlines():
... with open("test.docx","a") as mon:
... mon.write(line)
...
2
30
27

test.py【将ANSI编码的TXT可以转换为DOC或者DOCX】
import docx
ms=open("ansi.txt")
#print(ms)

for line in ms.readlines():
#with open("test.doc","a") as mon:
with open("test.docx","a") as mon:
mon.write(line)

参考资料：
https://blog.51cto.com/u_16175451/6829720
python怎么给txt文档添加换行符

https://www.zhihu.com/question/29948454/answer/2774476613?utm_id=0
请问python怎么做到在写入的TXT中换行?

line = line.strip('\n')

https://blog.csdn.net/u010565244/article/details/19193635
关于python 的line.strip()方法

python utf-8 txt 转 DOCX 多一个换行
【貌似有道理，但是没有实现】
https://www.jianshu.com/p/7307262a6197
使用python批量转换编码时多余换行的问题

最近使用python批量将项目中的GBK编码文件转换为UTF8时遇到了会自动给每一行结尾多添加一个换行符的问题这样会导致多行宏命令失效

原因是使用文本读写模式 ‘w’ ‘r’

修改为使用 ‘wb’ ‘rb’ 使用二进制接收在使用utf8编码为str然后以二进制方式写入就可以了

python write 换行
python txt 转 DOCX
Python TXT 转 DOCX 多换行
python utf8转gbk
https://blog.csdn.net/qq_40845077/article/details/124872708
Python代码——实现txt转docx

https://blog.csdn.net/qq_40837206/article/details/130323856
python实现txt与docx互转

https://codeleading.com/article/62046304563/
Python代码——实现txt转docx

https://blog.csdn.net/qq_33005553/article/details/124755791
python 去除 txt文本换行

python 递归读取
https://blog.51cto.com/love51/6389966
python递归获取文件 python 递归文件夹

https://www.bilibili.com/read/cv13745103/
Python代码——实现txt转docx

https://zhuanlan.zhihu.com/p/564678085
Python txt文件转word 格式

https://pythonjishu.com/nwbuyryewwscpxl/
使用Python对文件进行批量改名的方法