sanger序列拼接--一次错误示范

news2026/3/13 5:11:42

文章目录

目的
实现步骤

目的

NGS得到了很多的reads，其中有一些paired reads我想根据overlap 搭建起来，因为我对序列的ID做了删减，所以再pandaseq那里跑不通。

总结来说，目的很简单，就是把有重叠区域的 reads 搭起来，变成一段更长的序列。

实现步骤

conda install -c bioconda spades

spades.py -h

# 看一下我的文件长什么样子
head all_reads_map_to_plasmid_1.fq all_reads_map_to_plasmid_2.fq
# 因为我的文件中有些序列可能不存在配对序列，我现在提前把他们剔除掉
seqkit pair -1 all_reads_map_to_plasmid_1.fq  -2 all_reads_map_to_plasmid_2.fq  --id-regexp '^(\S+)\/[12]'

# 产生了如下文件
# [INFO] 254 paired-end reads saved to all_reads_map_to_plasmid_1.paired.fq and all_reads_map_to_plasmid_2.paired.fq

# 运行spades 进行拼接
spades.py -1 all_reads_map_to_plasmid_1.paired.fq -2 all_reads_map_to_plasmid_2.paired.fq -o ./test
# 最终的结果日下
# 确定不是我想要的结果

在这里插入图片描述

SPAdes genome assembler v3.13.1

Usage: /opt/miniconda3/bin/spades.py [options] -o <output_dir>

Basic options:
-o      <output_dir>    directory to store all the resulting files (required)
--sc                    this flag is required for MDA (single-cell) data # 拼接单细胞测序数据
--meta                  this flag is required for metagenomic sample data # 拼接宏基因组测序数据
--rna                   this flag is required for RNA-Seq data # 拼接转录组测序数据
--plasmid               runs plasmidSPAdes pipeline for plasmid detection # 拼接质粒
--iontorrent            this flag is required for IonTorrent data
--test                  runs SPAdes on toy dataset
-h/--help               prints this usage message
-v/--version            prints version

Input data:
--12    <filename>      file with interlaced forward and reverse paired-end reads # PE 双末端交错的排位的reads (fastq)
-1      <filename>      file with forward paired-end reads # PE forward端reads (fastq)
-2      <filename>      file with reverse paired-end reads # PE reverse端reads (fastq)
-s      <filename>      file with unpaired reads # PE 未配对reads (fastq)
--merged        <filename>      file with merged forward and reverse paired-end reads # 合并的PE 双末端reads (fastq)
(此处省略19个参数)
--sanger        <filename>      file with Sanger reads # 与sanger测序结果混合拼接
--pacbio        <filename>      file with PacBio reads # 与PacBio测序结果混合拼接
--nanopore      <filename>      file with Nanopore reads # 与Nanopore测序结果混合拼接

Pipeline options:
--only-error-correction runs only read error correction (without assembling) # 只进行纠错
--only-assembler        runs only assembling (without read error correction) # 只进行拼接
--careful               tries to reduce number of mismatches and short indels
# 通过运行 MismatchCorrector 模块进行基因组上 mismatches 和 short indels 的修正。推荐使用此参数。
--continue              continue run from the last available check-point
(此处省略3个参数)

Advanced options:
--dataset       <filename>      file with dataset description in YAML format
-t/--threads    <int>           number of threads  [default: 16] # 计算核心/线程数
-m/--memory  <int>          RAM limit for SPAdes in Gb (terminates if exceeded) [default: 250]
                                        # SPAdes对内存的要求较高 ！！！硬件允许的情况下最好设定-m 500 甚至跟高。
--tmp-dir  <dirname>       directory for temporary files [default: <output_dir>/tmp]
-k   <int,int,...>   comma-separated list of k-mer sizes (must be odd and less than 128) [default: 'auto']
                                    # Kmer长度，可设置多个：-k 33,43,55,63,73,89
--cov-cutoff    <float>         coverage cutoff value (a positive float number, or 'auto', or 'off') [default: 'off']
--phred-offset  <33 or 64>      PHRED quality offset in the input reads (33 or 64)  [default: auto-detect]

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/1985664.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！