文章目录
- 目的
 - 实现步骤
 
目的
NGS得到了很多的reads,其中有一些paired reads我想根据overlap 搭建起来,因为我对序列的ID做了删减,所以再pandaseq那里跑不通。
总结来说,目的很简单,就是把 有重叠区域的 reads 搭起来,变成一段更长的序列。
实现步骤
conda install -c bioconda spades
spades.py -h
# 看一下我的文件长什么样子
head all_reads_map_to_plasmid_1.fq all_reads_map_to_plasmid_2.fq
# 因为我的文件中有些序列可能不存在配对序列,我现在提前把他们剔除掉
seqkit pair -1 all_reads_map_to_plasmid_1.fq  -2 all_reads_map_to_plasmid_2.fq  --id-regexp '^(\S+)\/[12]'
# 产生了如下文件
# [INFO] 254 paired-end reads saved to all_reads_map_to_plasmid_1.paired.fq and all_reads_map_to_plasmid_2.paired.fq
# 运行spades 进行拼接
spades.py -1 all_reads_map_to_plasmid_1.paired.fq -2 all_reads_map_to_plasmid_2.paired.fq -o ./test
# 最终的结果日下
# 确定不是我想要的结果
 


SPAdes genome assembler v3.13.1
Usage: /opt/miniconda3/bin/spades.py [options] -o <output_dir>
Basic options:
-o      <output_dir>    directory to store all the resulting files (required)
--sc                    this flag is required for MDA (single-cell) data # 拼接单细胞测序数据
--meta                  this flag is required for metagenomic sample data # 拼接宏基因组测序数据
--rna                   this flag is required for RNA-Seq data # 拼接转录组测序数据
--plasmid               runs plasmidSPAdes pipeline for plasmid detection # 拼接质粒
--iontorrent            this flag is required for IonTorrent data
--test                  runs SPAdes on toy dataset
-h/--help               prints this usage message
-v/--version            prints version
Input data:
--12    <filename>      file with interlaced forward and reverse paired-end reads # PE 双末端交错的排位的reads (fastq)
-1      <filename>      file with forward paired-end reads # PE forward端reads (fastq)
-2      <filename>      file with reverse paired-end reads # PE reverse端reads (fastq)
-s      <filename>      file with unpaired reads # PE 未配对reads (fastq)
--merged        <filename>      file with merged forward and reverse paired-end reads # 合并的PE 双末端reads (fastq)
(此处省略19个参数)
--sanger        <filename>      file with Sanger reads # 与sanger测序结果混合拼接
--pacbio        <filename>      file with PacBio reads # 与PacBio测序结果混合拼接
--nanopore      <filename>      file with Nanopore reads # 与Nanopore测序结果混合拼接
Pipeline options:
--only-error-correction runs only read error correction (without assembling) # 只进行纠错
--only-assembler        runs only assembling (without read error correction) # 只进行拼接
--careful               tries to reduce number of mismatches and short indels
# 通过运行 MismatchCorrector 模块进行基因组上 mismatches 和 short indels 的修正。推荐使用此参数。
--continue              continue run from the last available check-point
(此处省略3个参数)
Advanced options:
--dataset       <filename>      file with dataset description in YAML format
-t/--threads    <int>           number of threads  [default: 16] # 计算核心/线程数
-m/--memory  <int>          RAM limit for SPAdes in Gb (terminates if exceeded) [default: 250]
                                        # SPAdes对内存的要求较高 !!!硬件允许的情况下最好设定-m 500 甚至跟高。
--tmp-dir  <dirname>       directory for temporary files [default: <output_dir>/tmp]
-k   <int,int,...>   comma-separated list of k-mer sizes (must be odd and less than 128) [default: 'auto']
                                    # Kmer长度,可设置多个:-k 33,43,55,63,73,89
--cov-cutoff    <float>         coverage cutoff value (a positive float number, or 'auto', or 'off') [default: 'off']
--phred-offset  <33 or 64>      PHRED quality offset in the input reads (33 or 64)  [default: auto-detect]
                


















