Seq — a language for bioinformatics

## Introduction > **A strongly-typed and statically-compiled high-performance Pythonic language!** Seq is a programming language for computational genomics and bioinformatics. With a Python-compatible syntax and a host of domain-specific features and optimizations, Seq makes writing high-performance genomics software as easy as writing Python code, and achieves performance comparable to (and in many cases better than) C/C++. **Think of Seq as a strongly-typed and statically-compiled Python: all the bells and whistles of Python, boosted with a strong type system, without any performance overhead.** Seq is able to outperform Python code by up to 160x. Seq can further beat equivalent C/C++ code by up to 2x without any manual interventions, and also natively supports parallelism out of the box. Implementation details and benchmarks are discussed [in our paper](https://dl.acm.org/citation.cfm?id=3360551). Learn more by following the [tutorial](https://docs.seq-lang.org/tutorial) or from the [cookbook](https://docs.seq-lang.org/cookbook). ## Examples Seq is a Python-compatible language, and many Python programs should work with few if any modifications: ```python def fib(n): a, b = 0, 1 while a < n: print(a, end=' ') a, b = b, a+b print() fib(1000) ``` This prime counting example showcases Seq's [OpenMP](https://www.openmp.org/) support, enabled with the addition of one line. The `@par` annotation tells the compiler to parallelize the following for-loop, in this case using a dynamic schedule, chunk size of 100, and 16 threads. ```python from sys import argv def is_prime(n): factors = 0 for i in range(2, n): if n % i == 0: factors += 1 return factors == 0 limit = int(argv[1]) total = 0 @par(schedule='dynamic', chunk_size=100, num_threads=16) for i in range(2, limit): if is_prime(i): total += 1 print(total) ``` Here is an example showcasing some of Seq's bioinformatics features, which include native sequence and k-mer types. ```python from bio import * s = s'ACGTACGT' # sequence literal print(s[2:5]) # subsequence print(~s) # reverse complement kmer = Kmer[8](s) # convert to k-mer # iterate over length-3 subsequences # with step 2 for sub in s.split(3, step=2): print(sub[-1]) # last base # iterate over 2-mers with step 1 for kmer in sub.kmers(step=1, k=2): print(~kmer) # '~' also works on k-mers ``` ## Install ### Pre-built binaries Pre-built binaries for Linux and macOS on x86_64 are available alongside [each release](https://github.com/seq-lang/seq/releases). We also have a script for downloading and installing pre-built versions: ```bash /bin/bash -c "$(curl -fsSL https://seq-lang.org/install.sh)" ``` ### Build from source See [Building from Source](docs/sphinx/build.rst). ## Documentation Please check [docs.seq-lang.org](https://docs.seq-lang.org) for in-depth documentation. ## Citing Seq If you use Seq in your research, please cite: > Ariya Shajii, Ibrahim Numanagić, Riyadh Baghdadi, Bonnie Berger, and Saman Amarasinghe. 2019. Seq: a high-performance language for bioinformatics. *Proc. ACM Program. Lang.* 3, OOPSLA, Article 125 (October 2019), 29 pages. DOI: https://doi.org/10.1145/3360551 BibTeX: ``` @article{Shajii:2019:SHL:3366395.3360551, author = {Shajii, Ariya and Numanagi\'{c}, Ibrahim and Baghdadi, Riyadh and Berger, Bonnie and Amarasinghe, Saman}, title = {Seq: A High-performance Language for Bioinformatics}, journal = {Proc. ACM Program. Lang.}, issue_date = {October 2019}, volume = {3}, number = {OOPSLA}, month = oct, year = {2019}, issn = {2475-1421}, pages = {125:1--125:29}, articleno = {125}, numpages = {29}, url = {http://doi.acm.org/10.1145/3360551}, doi = {10.1145/3360551}, acmid = {3360551}, publisher = {ACM}, address = {New York, NY, USA}, keywords = {Python, bioinformatics, computational biology, domain-specific language, optimization, programming language}, } ```