Some fasta files contain all chromosomes from one genome,sometimes users have to split these chromosomes into different files according to their number label.The msqchr can help to handle this, so that the specific chromosome fasta file can be used for downstream analysis.
::install_github("MSQ-123/chromseq") devtools
Replace tedious chromosome identifier into simple format. So that the subtracted ids are easy to manipulate.
data("id")
<- replaceText(type = "text",input = id) simpleID
Subtract chromosome ids from a fasta file
data("text")
<- replaceText(type = "text",input = text)
text<- subFasID(text = text) id
Transform the large character object into special list:
<- tempfile(fileext = ".data")
fil write(text,file = fil)
<- file(fil, "r")
con0 <- readToList(id,text = text,con = con0) tex
Sort the chromosome list according to their number. Note: the “single” and “double” chromosome should be sort separately. Note: This data is already sorted, this is just for expository purposes.
<- sortList(id=id,tex = tex,chrsig = "single")
tex2<- sortList(id=id,tex = tex,chrsig = "double") tex3
Now we can split the chromosome fasta file into different files according to their number.
<- tempdir()
outdir splitChr(tex = tex2,chr=seq(1,9),sex = TRUE,outdir = outdir)
#chromosome X or Y is "single",so the tex should be a
#"single" chromosome list.
#In the below case, sex should be F.
splitChr(tex = tex3,chr=seq(10,22),sex = FALSE,outdir = outdir)