Purpuratus donor splice site model. A. Analysis of the frequency of each base within the splice site reveals the S. purpuratus donor splice site consensus sequence. The nine nt window surrounding the donor splice sites from 292 annotated S. purpuratus gene models (2845 donor sequences) were extracted, and the frequency of each nt within the window was calculated. The values shown in bold are the consensus nucleotides. Positions 1 and 2 are invariant because only canonical splice sites were used in this analysis. B. The Purpuratus splice site model incorporated non-adjacent dependences among the bases within the splice site. The model is implemented such that a splice site score of a given candidate sequence is computed using the matrix determined by applying the set of rules shown in the flowchart. For example, the sequence AAG GTAAGT would be scored using the matrix A-2G5G-1A4T6 (A-2→A-2G5→A-2G5G-1→ A-2G5G-1A4→A-2G5G-1A4T6). Non-adjacent dependences were calculated for the 2845 S. purpuratus donor splice sites for each of the seven variable positions between the consensus nt and the non-consensus nucleotides in the other six positions (Table 1). The position with the maximum dependencies was used to serially subdivide the sites until either the subdivision became too small to obtain reliable data, or no more significant dependences were observed. Position frequency matrices are shown, which were calculated for each of the terminal subdivisions and ultimately used in the Purpuratus splice site model.