副标题#e#
1.下载:
https://github.com/lh3/wgsim
可以git或者zip
2.安装:
gcc -g -O2 -Wall -o wgsim wgsim.c -lz -lm
3.数据下载:可以使用bwakit下载:
https://github.com/lh3/bwa/tree/master/bwakit
下载:
bwa.kit/run-gen-ref hs38DH
4.使用方法和默认配置:
hadoop@Master:~/cloud/spark-1.5.2/examples/src/main/resources$ wgsim Program: wgsim (short read simulator) Version: 0.3.2 Contact: Heng Li <lh3@sanger.ac.uk> Usage: wgsim [options] <in.ref.fa> <out.read1.fq> <out.read2.fq> Options: -e FLOAT base error rate [0.020] -d INT outer distance between the two ends [500] -s INT standard deviation [50] -N INT number of read pairs [1000000] -1 INT length of the first read [70] -2 INT length of the second read [70] -r FLOAT rate of mutations [0.0010] -R FLOAT fraction of indels [0.15] -X FLOAT probability an indel is extended [0.30] -S INT seed for random generator [0,use the current time] -A FLOAT discard if the fraction of ambiguous bases higher than FLOAT [0.05] -h haplotype mode
5.使用实践:
(1)默认双端:
wgsim hs38DH.fa PE/hs38DHPE1LallF1.fq PE/hs38DHPE1LallF2.fq
(2)默认匹配
hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ wgsim hs38DH.fa hs38DHSELallF1V2.fq /dev/null
(3)-N ?产生reads的数量
-N 10000
wgsim -N 1000 hs38DH.fa PE/hs38DHPE1L1000F1.fq PE/hs38DHPE1L1000F2.fq
查看:
文件长度:
hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ cat PE/hs38DHPE1L10000F1.fq |wc -l 39740
fq的格式为一条reads四行信息
文件内容:
hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ cat PE/hs38DHPE1L10000F1.fq |head -20 @chrUn_KN707606v1_decoy_29_523_2:0:0_1:0:0_0/1 ATGCCCAGCTGGTTTCTGATACTTCTAATCAAATGTCTTATCCCCCAAATTAGCCCTGGGAGTGAGAATA + 2222222222222222222222222222222222222222222222222222222222222222222222 @chrUn_KN707606v1_decoy_657_1222_1:0:0_1:0:0_1/1 GTGGTGCACACCTGTAGTGCCTGTTCCTTGGGAGGCTGAGGCCGGAGGATCCCTTGAGCCCAGGAGTTCA + 2222222222222222222222222222222222222222222222222222222222222222222222 @chrUn_KN707606v1_decoy_1052_1588_2:0:0_1:1:0_2/1 GTCCAAACACCACGTGACAAGCCCATTCTTCCATTTTCTCAGACCATAAACTGCACTGTCCTCTAACTGC + 2222222222222222222222222222222222222222222222222222222222222222222222 @chrUn_KN707607v1_decoy_1123_1686_1:0:0_2:0:0_0/1 GAGGATATTTTGTTTAGTCACTAGGATTTCTTAACATTCTGAAATTCTATTCACCTCTGATTTTGTCTAT + 2222222222222222222222222222222222222222222222222222222222222222222222 @chrUn_KN707607v1_decoy_877_1369_0:0:0_0:0:0_1/1 TATAGTTAACATAACATGGTCTATCTTTAGATAATCTCCATGCACAGTAAGATAATATTTTTTCTAGGAC + 2222222222222222222222222222222222222222222222222222222222222222222222
(4)-1 第一个的reads的长度
-1 10表示第一个位置的fq的reads长为10
wgsim -N10000 -1 10 hs38DH.fa SE/hs38DHSE1N10000L10F1.fq /dev/null
信息查看:
hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ cat SE/hs38DHSE1N10000L10F1.fq |wc -l 39740 hadoop@Mcnode1:~/cloud/adam/xubo/data/hs38DH$ cat SE/hs38DHSE1N10000L10F1.fq |head -20 @chrUn_KN707606v1_decoy_216_790_0:0:0_2:0:0_0/1 CATGTCTTTC + 2222222222 @chrUn_KN707606v1_decoy_1191_1728_0:0:0_1:0:0_1/1 TTAACCTTAA + 2222222222 @chrUn_KN707606v1_decoy_792_1284_1:0:0_0:0:0_2/1 CAGAACAAAA + 2222222222 @chrUn_KN707607v1_decoy_1925_2441_0:0:0_1:0:0_0/1 TGCAGGTTTG + 2222222222 @chrUn_KN707607v1_decoy_2305_2757_1:0:0_3:0:0_1/1 GGACAAGGGA + 2222222222
6.其他:
(1)匹配:
使用BWA构建索引:
#p#副标题#e##p#分页标题#e#
hadoop@Master:~/cloud/adam/xubo/data/wgsim/hs38DH$ ll -h total 22M drwxrwxr-x 4 hadoop hadoop 4.0K 4月 15 15:48 ./ drwxrwxr-x 7 hadoop hadoop 4.0K 4月 11 17:10 ../ -rw-rw-r-- 1 hadoop hadoop 8.0M 4月 11 17:08 hs38DH.fa -rw-r--r-- 1 hadoop hadoop 477K 4月 11 17:08 hs38DH.fa.alt -rw-rw-r-- 1 hadoop hadoop 15 4月 11 17:10 hs38DH.fa.amb -rw-rw-r-- 1 hadoop hadoop 365K 4月 11 17:10 hs38DH.fa.ann -rw-rw-r-- 1 hadoop hadoop 7.6M 4月 11 17:10 hs38DH.fa.bwt -rw-rw-r-- 1 hadoop hadoop 1.9M 4月 11 17:10 hs38DH.fa.pac -rw-rw-r-- 1 hadoop hadoop 3.8M 4月 11 17:10 hs38DH.fa.sa drwxrwxr-x 2 hadoop hadoop 4.0K 4月 15 16:23 PE/ drwxrwxr-x 2 hadoop hadoop 4.0K 4月 15 15:48 SE/
(2)转变成adam
hadoop@Master:~/cloud$ adam-submit fasta2adam /xubo/adam/hs38DH/hs38DH.fa /xubo/adam/hs38DH/adam/hs38DH.adam Using ADAM_MAIN=org.bdgenomics.adam.cli.ADAMMain Using SPARK_SUBMIT=/home/hadoop/cloud/spark-1.5.2//bin/spark-submit SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.