Biological sequence data is stored and transmitted using a wide variety of different file formats. This package provides convenient methods to read and write several of these file fomats.
WebLogo is often capable of guessing the correct file type, either from the file extension or the structure of the file:
>>> import weblogo.seq_io
>>> afile = open("test_weblogo/data/cap.fa")
>>> seqs = weblogo.seq_io.read(afile)
Alternatively, each sequence file type has a separate module named FILETYPE_io (e.g. fasta_io, clustal_io):
>>> import weblogo.seq_io.fasta_io
>>> afile = open("test_weblogo/data/cap.fa")
>>> seqs = weblogo.seq_io.fasta_io.read(afile)
Sequence data can also be written back to files:
>>> fout = open("out.fa", "w")
>>> weblogo.seq_io.fasta_io.write(fout, seqs)
Module Name Extension read write features
---------------------------------------------------------------------------
array_io array, flatfile yes yes none
clustal_io clustalw aln yes yes
fasta_io fasta, Pearson fa yes yes none
genbank_io genbank gb yes
intelligenetics_io intelligenetics ig yes yes
msf_io msf msf yes
nbrf_io nbrf, pir pir yes
nexus_io nexus nexus yes
phylip_io phylip phy yes
plain_io plain, raw yes yes none
table_io table tbl yes yes none
Each IO module defines one or more of the following functions and variables:
weblogo.seq_io.
read
(fin: TextIO, alphabet: Optional[weblogo.seq.Alphabet] = None) → weblogo.seq.SeqList¶Read a sequence file and attempt to guess its format. First the filename extension (if available) is used to infer the format. If that fails, then we attempt to parse the file using several common formats.
Note, fin cannot be unseekable stream such as sys.stdin
weblogo.seq_io.
format_names
() → dict¶Return a map between format names and format modules
weblogo.seq_io.
format_extensions
() → dict¶Return a map between filename extensions and sequence file types