v3.7.13.dev0+gaf571ab.d20220726

Alphabets and Sequences

Alphabetic sequences and associated tools and data.

Seq is a subclass of a python string with additional annotation and an alphabet. The characters in string must be contained in the alphabet. Various standard alphabets are provided.

Classes

Alphabet    -- A subset of non-null ascii characters
Seq         -- An alphabetic string
SeqList     -- A collection of Seq's

Alphabets

o generic_alphabet  -- A generic alphabet. Any printable ASCII character.
o protein_alphabet -- IUCAP/IUB Amino Acid one letter codes.
o nucleic_alphabet -- IUPAC/IUB Nucleic Acid codes 'ACGTURYSWKMBDHVN-'
o dna_alphabet -- Same as nucleic_alphabet, with 'U' (Uracil) an
    alternative for 'T' (Thymidine).
o rna_alphabet -- Same as nucleic_alphabet, with 'T' (Thymidine) an
    alternative for 'U' (Uracil).
o reduced_nucleic_alphabet -- All ambiguous codes in 'nucleic_alphabet' are
    alternative to 'N' (aNy)
o reduced_protein_alphabet -- All ambiguous ('BZJ') and non-canonical amino
    acids codes ( 'U', Selenocysteine and 'O', Pyrrolysine)  in
    'protein_alphabet' are alternative to 'X'.
o unambiguous_dna_alphabet -- 'ACGT'
o unambiguous_rna_alphabet -- 'ACGU'
o unambiguous_protein_alphabet -- The twenty canonical amino acid one letter
    codes, in alphabetic order, 'ACDEFGHIKLMNPQRSTVWY'

Amino Acid Codes:

Code  Alt.  Meaning
-----------------
A           Alanine
B           Aspartic acid or Asparagine
C           Cysteine
D           Aspartate
E           Glutamate
F           Phenylalanine
G           Glycine
H           Histidine
I           Isoleucine
J           Leucine or Isoleucine
K           Lysine
L           Leucine
M           Methionine
N           Asparagine
O           Pyrrolysine
P           Proline
Q           Glutamine
R           Arginine
S           Serine
T           Threonine
U           Selenocysteine
V           Valine
W           Tryptophan
Y           Tyrosine
Z           Glutamate or Glutamine
X    ?      any
*           translation stop
-    .~     gap

Nucleotide Codes:

Code  Alt.  Meaning
------------------------------
A           Adenosine
C           Cytidine
G           Guanine
T           Thymidine
U           Uracil
R           G A (puRine)
Y           T C (pYrimidine)
K           G T (Ketone)
M           A C (aMino group)
S           G C (Strong interaction)
W           A T (Weak interaction)
B           G T C (not A) (B comes after A)
D           G A T (not C) (D comes after C)
H           A C T (not G) (H comes after G)
V           G C A (not T, not U) (V comes after U)
N   X?      A G C T (aNy)
-   .~      A gap
Refs:
http://www.chem.qmw.ac.uk/iupac/AminoAcid/A2021.html http://www.chem.qmw.ac.uk/iubmb/misc/naseq.html
Authors:
GEC 2004,2005
class weblogo.seq.Alphabet

An ordered subset of printable ascii characters.

Status:
Beta
Authors:
  • GEC 2005
alphabetic(string: str) → bool

True if all characters of the string are in this alphabet.

chr(n: int) → str

The n’th character in the alphabet (zero indexed) or 0

chrs(sequence_of_ints: Sequence[int]) → weblogo.seq.Seq

Convert a sequence of ordinals into an alphabetic string.

letters() → str

Letters of the alphabet as a string.

normalize(string: str) → weblogo.seq.Seq

Normalize an alphabetic string by converting all alternative symbols to the canonical equivalent in ‘letters’.

ord(c: str) → int

The ordinal position of the character c in this alphabet, or 255 if no such character.

ords(string: Union[Seq, str]) → array.array

Convert an alphabetic string into a byte array of ordinals.

static which(seqs: Union[Seq, SeqList], alphabets: List[Alphabet] = None) → weblogo.seq.Alphabet

Returns the most appropriate unambiguous protein, RNA or DNA alphabet for a Seq or SeqList. If a list of alphabets is supplied, then the best alphabet is selected from that list.

The heuristic is to count the occurrences of letters for each alphabet and downweight longer alphabets by the log of the alphabet length. Ties go to the first alphabet in the list.

class weblogo.seq.Seq

An alphabetic string. A subclass of “str” consisting solely of letters from the same alphabet.

alphabet -- A string or Alphabet of allowed characters.
name -- A short string used to identify the sequence.
description -- A string describing the sequence
Authors :
GEC 2005
back_translate() → weblogo.seq.Seq

Translate a protein sequence back into coding DNA, using the standard genetic code. See weblogo.transform.GeneticCode for details and more options.

complement() → weblogo.seq.Seq

Returns complementary nucleic acid sequence.

join(str_list: List[Seq]) → weblogo.seq.Seq

Concatenate any number of strings.

The string whose method is called is inserted in between each given string. The result is returned as a new string.

Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’

lower() → weblogo.seq.Seq

Return a lower case copy of the sequence.

mask(letters: str = 'abcdefghijklmnopqrstuvwxyz', mask: str = 'X') → weblogo.seq.Seq

Replace all occurrences of letters with the mask character. The default is to replace all lower case letters with ‘X’.

ords() → array.array

Convert sequence to an array of integers in the range [0, len(alphabet) )

remove(delchars: str) → weblogo.seq.Seq

Return a new alphabetic sequence with all characters in ‘delchars’ removed.

reverse() → weblogo.seq.Seq

Return the reversed sequence.

Note that this method returns a new object, in contrast to the in-place reverse() method of list objects.

reverse_complement() → weblogo.seq.Seq

Returns reversed complementary nucleic acid sequence (i.e. the other strand of a DNA sequence.)

tally(alphabet: weblogo.seq.Alphabet = None) → List[int]

Counts the occurrences of alphabetic characters.

Arguments: - alphabet – an optional alternative alphabet

Returns :
A list of character counts in alphabetic order.
tostring() → str

Converts Seq to a raw string.

translate() → weblogo.seq.Seq

Translate a nucleotide sequence to a polypeptide using full IUPAC ambiguities in DNA/RNA and amino acid codes, using the standard genetic code. See weblogo.transform.GeneticCode for details and more options.

upper() → weblogo.seq.Seq

Return a lower case copy of the sequence.

word_count(k: int, alphabet: weblogo.seq.Alphabet = None) → List[T]

Return a count of all subwords in the sequence.

>>> from weblogo.seq import *
>>> Seq("abcabc").word_count(3)
[('abc', 2), ('bca', 1), ('cab', 1)]
words(k: int, alphabet: weblogo.seq.Alphabet = None) → Generator[str, None, None]

Return an iteration over all subwords of length k in the sequence. If an optional alphabet is provided, only words from that alphabet are returned.

>>> list(Seq("abcabc").words(3))
['abc', 'bca', 'cab', 'abc']
weblogo.seq.rna(string: str) → weblogo.seq.Seq

Create an alphabetic sequence representing a stretch of RNA.

weblogo.seq.dna(string: str) → weblogo.seq.Seq

Create an alphabetic sequence representing a stretch of DNA.

weblogo.seq.protein(string: str) → weblogo.seq.Seq

Create an alphabetic sequence representing a stretch of polypeptide.

class weblogo.seq.SeqList(alist: List[weblogo.seq.Seq] = [], alphabet: weblogo.seq.Alphabet = None, name: str = None, description: str = None)

A list of sequences.

isaligned() → bool

Are all sequences of the same length and alphabet?

ords(alphabet: weblogo.seq.Alphabet = None) → List[array.array]

Convert sequence list into a 2D array of ordinals.

profile(alphabet: weblogo.seq.Alphabet = None)

Counts the occurrences of characters in each column.

Returns: Motif(counts, alphabet)

tally(alphabet: weblogo.seq.Alphabet = None) → List[int]

Counts the occurrences of alphabetic characters.

Parameters:alphabet -- an optional alternative alphabet (-) –

Returns : A list of character counts in alphabetic order.