Alphabetic sequences and associated tools and data.
Seq is a subclass of a python string with additional annotation and an alphabet. The characters in string must be contained in the alphabet. Various standard alphabets are provided.
Classes
Alphabet -- A subset of non-null ascii characters
Seq -- An alphabetic string
SeqList -- A collection of Seq's
Alphabets
o generic_alphabet -- A generic alphabet. Any printable ASCII character.
o protein_alphabet -- IUCAP/IUB Amino Acid one letter codes.
o nucleic_alphabet -- IUPAC/IUB Nucleic Acid codes 'ACGTURYSWKMBDHVN-'
o dna_alphabet -- Same as nucleic_alphabet, with 'U' (Uracil) an
alternative for 'T' (Thymidine).
o rna_alphabet -- Same as nucleic_alphabet, with 'T' (Thymidine) an
alternative for 'U' (Uracil).
o reduced_nucleic_alphabet -- All ambiguous codes in 'nucleic_alphabet' are
alternative to 'N' (aNy)
o reduced_protein_alphabet -- All ambiguous ('BZJ') and non-canonical amino
acids codes ( 'U', Selenocysteine and 'O', Pyrrolysine) in
'protein_alphabet' are alternative to 'X'.
o unambiguous_dna_alphabet -- 'ACGT'
o unambiguous_rna_alphabet -- 'ACGU'
o unambiguous_protein_alphabet -- The twenty canonical amino acid one letter
codes, in alphabetic order, 'ACDEFGHIKLMNPQRSTVWY'
Amino Acid Codes:
Code Alt. Meaning
-----------------
A Alanine
B Aspartic acid or Asparagine
C Cysteine
D Aspartate
E Glutamate
F Phenylalanine
G Glycine
H Histidine
I Isoleucine
J Leucine or Isoleucine
K Lysine
L Leucine
M Methionine
N Asparagine
O Pyrrolysine
P Proline
Q Glutamine
R Arginine
S Serine
T Threonine
U Selenocysteine
V Valine
W Tryptophan
Y Tyrosine
Z Glutamate or Glutamine
X ? any
* translation stop
- .~ gap
Nucleotide Codes:
Code Alt. Meaning
------------------------------
A Adenosine
C Cytidine
G Guanine
T Thymidine
U Uracil
R G A (puRine)
Y T C (pYrimidine)
K G T (Ketone)
M A C (aMino group)
S G C (Strong interaction)
W A T (Weak interaction)
B G T C (not A) (B comes after A)
D G A T (not C) (D comes after C)
H A C T (not G) (H comes after G)
V G C A (not T, not U) (V comes after U)
N X? A G C T (aNy)
- .~ A gap
weblogo.seq.
Alphabet
¶An ordered subset of printable ascii characters.
alphabetic
(string: str) → bool¶True if all characters of the string are in this alphabet.
chr
(n: int) → str¶The n’th character in the alphabet (zero indexed) or 0
chrs
(sequence_of_ints: Sequence[int]) → weblogo.seq.Seq¶Convert a sequence of ordinals into an alphabetic string.
letters
() → str¶Letters of the alphabet as a string.
normalize
(string: str) → weblogo.seq.Seq¶Normalize an alphabetic string by converting all alternative symbols to the canonical equivalent in ‘letters’.
ord
(c: str) → int¶The ordinal position of the character c in this alphabet, or 255 if no such character.
ords
(string: Union[Seq, str]) → array.array¶Convert an alphabetic string into a byte array of ordinals.
which
(seqs: Union[Seq, SeqList], alphabets: List[Alphabet] = None) → weblogo.seq.Alphabet¶Returns the most appropriate unambiguous protein, RNA or DNA alphabet for a Seq or SeqList. If a list of alphabets is supplied, then the best alphabet is selected from that list.
The heuristic is to count the occurrences of letters for each alphabet and downweight longer alphabets by the log of the alphabet length. Ties go to the first alphabet in the list.
weblogo.seq.
Seq
¶An alphabetic string. A subclass of “str” consisting solely of letters from the same alphabet.
alphabet -- A string or Alphabet of allowed characters.
name -- A short string used to identify the sequence.
description -- A string describing the sequence
back_translate
() → weblogo.seq.Seq¶Translate a protein sequence back into coding DNA, using the standard genetic code. See weblogo.transform.GeneticCode for details and more options.
complement
() → weblogo.seq.Seq¶Returns complementary nucleic acid sequence.
join
(str_list: List[Seq]) → weblogo.seq.Seq¶Concatenate any number of strings.
The string whose method is called is inserted in between each given string. The result is returned as a new string.
Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’
lower
() → weblogo.seq.Seq¶Return a lower case copy of the sequence.
mask
(letters: str = 'abcdefghijklmnopqrstuvwxyz', mask: str = 'X') → weblogo.seq.Seq¶Replace all occurrences of letters with the mask character. The default is to replace all lower case letters with ‘X’.
ords
() → array.array¶Convert sequence to an array of integers in the range [0, len(alphabet) )
remove
(delchars: str) → weblogo.seq.Seq¶Return a new alphabetic sequence with all characters in ‘delchars’ removed.
reverse
() → weblogo.seq.Seq¶Return the reversed sequence.
Note that this method returns a new object, in contrast to the in-place reverse() method of list objects.
reverse_complement
() → weblogo.seq.Seq¶Returns reversed complementary nucleic acid sequence (i.e. the other strand of a DNA sequence.)
tally
(alphabet: weblogo.seq.Alphabet = None) → List[int]¶Counts the occurrences of alphabetic characters.
Arguments: - alphabet – an optional alternative alphabet
tostring
() → str¶Converts Seq to a raw string.
translate
() → weblogo.seq.Seq¶Translate a nucleotide sequence to a polypeptide using full IUPAC ambiguities in DNA/RNA and amino acid codes, using the standard genetic code. See weblogo.transform.GeneticCode for details and more options.
upper
() → weblogo.seq.Seq¶Return a lower case copy of the sequence.
word_count
(k: int, alphabet: weblogo.seq.Alphabet = None) → List[T]¶Return a count of all subwords in the sequence.
>>> from weblogo.seq import *
>>> Seq("abcabc").word_count(3)
[('abc', 2), ('bca', 1), ('cab', 1)]
words
(k: int, alphabet: weblogo.seq.Alphabet = None) → Generator[str, None, None]¶Return an iteration over all subwords of length k in the sequence. If an optional alphabet is provided, only words from that alphabet are returned.
>>> list(Seq("abcabc").words(3))
['abc', 'bca', 'cab', 'abc']
weblogo.seq.
rna
(string: str) → weblogo.seq.Seq¶Create an alphabetic sequence representing a stretch of RNA.
weblogo.seq.
dna
(string: str) → weblogo.seq.Seq¶Create an alphabetic sequence representing a stretch of DNA.
weblogo.seq.
protein
(string: str) → weblogo.seq.Seq¶Create an alphabetic sequence representing a stretch of polypeptide.
weblogo.seq.
SeqList
(alist: List[weblogo.seq.Seq] = [], alphabet: weblogo.seq.Alphabet = None, name: str = None, description: str = None)¶A list of sequences.
isaligned
() → bool¶Are all sequences of the same length and alphabet?
ords
(alphabet: weblogo.seq.Alphabet = None) → List[array.array]¶Convert sequence list into a 2D array of ordinals.
profile
(alphabet: weblogo.seq.Alphabet = None)¶Counts the occurrences of characters in each column.
Returns: Motif(counts, alphabet)
tally
(alphabet: weblogo.seq.Alphabet = None) → List[int]¶Counts the occurrences of alphabetic characters.
Parameters: | alphabet -- an optional alternative alphabet (-) – |
---|
Returns : A list of character counts in alphabetic order.