| Title: | Basic Sequence Processing Tool for Biological Data |
| Version: | 0.2.0 |
| Description: | Primarily created as an easy and understanding way to do basic sequences surrounding the central dogma of molecular biology. |
| License: | GPL-3 |
| URL: | https://github.com/ambuvjyn/baseq |
| BugReports: | https://github.com/ambuvjyn/baseq/issues |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.3 |
| Imports: | ggplot2 |
| Suggests: | testthat (≥ 3.0.0), rmarkdown, knitr, Biostrings |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| LazyData: | true |
| NeedsCompilation: | no |
| Author: | Ambu Vijayan |
| Maintainer: | Ambu Vijayan <ambuvjyn@gmail.com> |
| Packaged: | 2026-03-11 22:11:40 UTC; ambuv |
| Depends: | R (≥ 3.5.0) |
| Repository: | CRAN |
| Date/Publication: | 2026-03-11 22:30:18 UTC |
Bioconductor Bridge
Description
Converts baseq sequences to Biostrings format.
Usage
as_Biostrings(s)
Arguments
s |
A character vector or list of sequences |
Value
A DNAStringSet object
S3 DNA Class
Description
Creates an S3 object of class baseq_dna.
Usage
as_baseq_dna(s)
Arguments
s |
A character string containing the sequence |
Value
A baseq_dna object
S3 RNA Class
Description
Creates an S3 object of class baseq_rna.
Usage
as_baseq_rna(s)
Arguments
s |
A character string containing the sequence |
Value
A baseq_rna object
Assembly Stats
Description
Computes N50, L50, and other assembly statistics.
Usage
calculate_assembly_stats(seqs)
Arguments
seqs |
A character vector or list of sequences (contigs) |
Value
A named numeric vector of statistics
Examples
contigs <- c("ATGC", "ATGCATGC", "ATGCATGCATGC")
calculate_assembly_stats(contigs)
Protein Net Charge
Description
Calculates the net electrical charge of a protein at a given pH.
Usage
calculate_charge(s, ph = 7.4)
Arguments
s |
A character string containing the protein sequence |
ph |
Numeric pH value (default: 7.4) |
Value
Numeric net charge
Codon Usage RSCU
Description
Calculates Relative Synonymous Codon Usage (RSCU).
Usage
calculate_codon_usage(s)
Arguments
s |
A character string containing the coding DNA sequence |
Value
A dataframe with codon statistics
Examples
data(sars_fragment)
calculate_codon_usage(sars_fragment)
Sequence Identity
Description
Compares two sequences of equal length.
Usage
calculate_identity(s1, s2)
Arguments
s1 |
First sequence |
s2 |
Second sequence |
Value
A list with Identity percentage and Hamming Distance
Examples
calculate_identity("ATGC", "ATGG")
Protein MW
Description
Calculates the molecular weight of a protein sequence.
Usage
calculate_mw(s)
Arguments
s |
A character string containing the protein sequence |
Value
Numeric molecular weight in Daltons
Protein pI
Description
Estimates the isoelectric point of a protein sequence.
Usage
calculate_pi(s)
Arguments
s |
A character string containing the protein sequence |
Value
Numeric pI value
Primer Tm
Description
Calculates the melting temperature of a primer sequence.
Usage
calculate_tm(s, salt = 50)
Arguments
s |
A character string containing the sequence |
salt |
Numeric salt concentration in mM (default: 50) |
Value
Numeric Tm in Celsius
Batch File Cleaner
Description
Cleans all sequences in a FASTA or FASTQ file.
Usage
clean_file(input_file, type = "auto", output_dir = "")
Arguments
input_file |
Path to input file |
type |
Sequence type ("DNA", "RNA", or "auto") |
output_dir |
Optional output directory |
Value
Path to the cleaned file
Universal Sequence Cleaner
Description
Removes non-standard characters from DNA or RNA sequences.
Usage
clean_seq(sequence, type = "auto")
Arguments
sequence |
A character string containing the sequence |
type |
A string "DNA", "RNA", or "auto" |
Value
A character string of the cleaned sequence
Count Bases
Description
Returns a frequency table of the bases in a sequence.
Usage
count_bases(s)
Arguments
s |
A character string containing the sequence |
Value
A table object with base counts
Examples
data(sars_fragment)
count_bases(sars_fragment)
K-mer Counting
Description
Counts all possible substrings of length k.
Usage
count_kmers(s, k = 3)
Arguments
s |
A character string containing the sequence |
k |
Integer length of k-mer |
Value
A table of k-mer counts
Examples
data(sars_fragment)
count_kmers(sars_fragment, k = 3)
Count Pattern
Description
Counts the occurrences of a specific pattern in a sequence.
Usage
count_pattern(s, p)
Arguments
s |
A character string containing the sequence |
p |
A character string containing the pattern to count |
Value
Integer count of occurrences
Examples
data(sars_fragment)
count_pattern(sars_fragment, "ATTA")
Translate DNA to Protein
Description
Translates a DNA sequence into protein in all 6 reading frames.
Usage
dna_to_protein(s, table = 1)
Arguments
s |
A character string containing the DNA sequence |
table |
Integer indicating the NCBI genetic code table (default: 1) |
Value
A list of translated protein sequences
DNA to RNA
Description
Transcribes a DNA sequence into RNA.
Usage
dna_to_rna(s)
Arguments
s |
A character string containing the DNA sequence |
Value
A character string of the RNA sequence
Convert FASTQ to FASTA
Description
Converts a FASTQ file to FASTA format.
Usage
fastq_to_fasta(fastq_file)
Arguments
fastq_file |
Path to input FASTQ |
Value
Path to output FASTA
Quality Filter FASTQ
Description
Filters FASTQ reads based on average quality score.
Usage
filter_fastq_quality(
input_file,
output_file,
min_avg_quality = 20,
phred_offset = 33
)
Arguments
input_file |
Path to input FASTQ |
output_file |
Path to output FASTQ |
min_avg_quality |
Minimum average Phred score (default: 20) |
phred_offset |
Phred offset (default: 33) |
CpG Island Detection
Description
Identifies candidate CpG islands in a DNA sequence.
Usage
find_cpg_islands(s, window = 200)
Arguments
s |
A character string containing the DNA sequence |
window |
Sliding window size (default: 200) |
Value
A dataframe with start and end positions
Find Longest ORF
Description
Scans a DNA sequence in all 6 reading frames to find the longest open reading frame.
Usage
find_longest_orf(s)
Arguments
s |
A character string containing the DNA sequence |
Value
A character string of the longest translated protein sequence
GC Content
Description
Calculates the percentage of G and C bases in a DNA sequence.
Usage
gc_content(s)
Arguments
s |
A character string containing the sequence |
Value
Numeric percentage of GC content
Examples
data(sars_fragment)
gc_content(sars_fragment)
Get Genetic Code
Description
Returns a mapping of codons to amino acids.
Usage
get_genetic_code(table = 1)
Arguments
table |
Integer NCBI genetic code table index |
Value
A named character vector
Plot AA Composition
Description
Visualizes the amino acid composition categorized by biochemical properties.
Usage
plot_aa_composition(s)
Arguments
s |
A character string containing the protein sequence |
Value
A ggplot object
Examples
prot <- "MKFLVLALAL"
plot_aa_composition(prot)
Plot Dot Plot
Description
Generates a dot plot comparison of two sequences.
Usage
plot_dotplot(s1, s2, window = 1)
Arguments
s1 |
First sequence |
s2 |
Second sequence |
window |
Integer word size for matching (default: 1) |
Value
A ggplot object
Examples
s1 <- "ATGCATGCATGC"
s2 <- "ATGCGTGCATGC"
plot_dotplot(s1, s2, window = 3)
Plot GC Skew
Description
Generates a sliding window plot of GC skew (G-C)/(G+C).
Usage
plot_gc_skew(s, window = 100)
Arguments
s |
A character string containing the DNA sequence |
window |
Integer window size (default: 100) |
Value
A ggplot object
Examples
data(sars_fragment)
plot_gc_skew(sars_fragment, window = 100)
Plot Hydrophobicity
Description
Generates a sliding window plot of protein hydrophobicity using the Kyte-Doolittle scale.
Usage
plot_hydrophobicity(s, window = 9)
Arguments
s |
A character string containing the protein sequence |
window |
Integer window size (default: 9) |
Value
A ggplot object
Examples
prot <- "MKFLVLALAL"
plot_hydrophobicity(prot, window = 3)
Universal Sequence Reader
Description
Reads a FASTA or FASTQ file and returns it as a dataframe or list.
Usage
read_seq(file, format = "df")
Arguments
file |
Path to the input sequence file |
format |
A string indicating "df" (dataframe) or "list" (default: "df") |
Value
A dataframe or list of the sequence data.
Universal Reverse Complement
Description
Generates the reverse complement of a DNA or RNA sequence.
Usage
rev_comp(sequence)
Arguments
sequence |
A character string containing the sequence |
Value
A character string of the reverse complement
Reverse Translation
Description
Converts a protein sequence back into DNA using common codons.
Usage
reverse_translate(s)
Arguments
s |
A character string containing the protein sequence |
Value
A character string of the resulting DNA sequence
RNA to DNA
Description
Reverse transcribes an RNA sequence into DNA.
Usage
rna_to_dna(s)
Arguments
s |
A character string containing the RNA sequence |
Value
A character string of the DNA sequence
Translate RNA to Protein
Description
Translates an RNA sequence into protein in all 6 reading frames.
Usage
rna_to_protein(s, table = 1)
Arguments
s |
A character string containing the RNA sequence |
table |
Integer indicating the NCBI genetic code table (default: 1) |
Value
A list of translated protein sequences
SARS-CoV-2 Genome Fragment
Description
A small fragment of the SARS-CoV-2 genome used for examples and testing.
Usage
sars_fragment
Format
A character string.
Source
NCBI GenBank
Motif Searching
Description
Finds all occurrences of a motif in a sequence.
Usage
search_motif(s, p)
Arguments
s |
A character string containing the sequence |
p |
A character string containing the motif (regex) |
Value
A dataframe with the Start, End, and Match string
Shuffle Sequence
Description
Randomly permutes the characters of a sequence.
Usage
shuffle_sequence(s)
Arguments
s |
A character string containing the sequence |
Value
A character string of the shuffled sequence
Virtual Digestion
Description
Simulates restriction enzyme digestion.
Usage
simulate_digestion(s, p)
Arguments
s |
A character string containing the DNA sequence |
p |
A character string containing the restriction site (regex) |
Value
A numeric vector of fragment lengths
Simulate FASTA File
Description
Generates a dummy FASTA dataset.
Usage
simulate_fasta(n_seq = 5, seq_len = 100, gc = NULL, type = "DNA", file = NULL)
Arguments
n_seq |
Number of sequences |
seq_len |
Length of each sequence |
gc |
Target GC content |
type |
"DNA" or "RNA" |
file |
Optional file path to save |
Value
A dataframe of simulated sequences
Simulate FASTQ File
Description
Generates a dummy FASTQ dataset.
Usage
simulate_fastq(
n_reads = 5,
read_len = 100,
gc = NULL,
type = "DNA",
file = NULL
)
Arguments
n_reads |
Number of reads |
read_len |
Length of each read |
gc |
Target GC content |
type |
"DNA" or "RNA" |
file |
Optional file path to save |
Value
A dataframe of simulated reads
PCR Simulator
Description
Simulates a PCR reaction and predicts amplicon sizes.
Usage
simulate_pcr(template, fwd, rev_p)
Arguments
template |
A character string containing the DNA template |
fwd |
A character string of the forward primer |
rev_p |
A character string of the reverse primer |
Value
A numeric vector of amplicon sizes
Simulate Sequence
Description
Generates a random DNA or RNA sequence.
Usage
simulate_sequence(len, gc = NULL, type = "DNA")
Arguments
len |
Integer length of the sequence |
gc |
Numeric target GC content (0 to 1) |
type |
"DNA" or "RNA" |
Value
A character string of the simulated sequence
FASTA Summary
Description
Generates a comprehensive summary of a multi-FASTA file.
Usage
summarize_fasta(file)
Arguments
file |
Path to the FASTA file |
Value
A summary dataframe
Examples
# summarize_fasta("path/to/my.fasta")
Generic Translate
Description
Generic function to translate DNA or RNA to protein.
Usage
translate(x, ...)
Arguments
x |
A baseq_dna or baseq_rna object |
... |
Additional arguments |
Value
A list of translated sequences
Universal Sequence Writer
Description
Writes a sequence object (dataframe or list) to a FASTA or FASTQ file.
Usage
write_seq(x, file)
Arguments
x |
A sequence object (dataframe or list) |
file |
Path to the output sequence file |
Value
Invisible TRUE