THAP1 (OMIM*609520) is a DNA-binding transcription regulator that regulates endothelial cell proliferation and G1/S cell-cycle progression (Roussigne et al., 2003). Two spliced mRNA variants that produce functional proteins have been reported (THAP1a: CCDS6136 and THAP1b: CCDS6137). The first 2.2 Kb isoform contains 3 exons (THAP1a), whereas the second corresponds to an alternatively spliced isoform that lacks exon 2 (2 kb mRNA, THAP1b). This second isoform encodes a truncated THAP1 protein without the C-terminus of the THAP domain. The two isoforms are expressed in many tissues, suggesting that THAP1 has a widespread (although not ubiquitous) distribution in humans.
Deduced primary structure:
THAP1a is a 213 amino acid-long protein characterized by a N-terminal THAP domain (amino acids 1 to 81) (Bessiere, et al., 2008) with DNA binding properties, followed by a proline-rich region (amino acids 90 to 110) and a nuclear localization signal (amino acids 146 to 162) (figure 1). Organization according to SWISS-PROT (Q9NVV9)

THAP domain
The THAP domain, which is about 80-amino acid long, exhibits some specific features (Figure 2). It is characterized by a C2CH signature (Cys-X2-4-Cys-X35-53-Cys-X2-His) associated with four invariant residues (Pro26, Trp36, Phe58, Pro78) in the THAP1 sequence. Direct mutagenesis of each of these eight amino acids showed their critical identity for the zinc-dependent specific binding of the THAP domain to a precise DNA sequence (TXXXGGCA: THABS consensus sequence) (Clouaire, et al., 2005) (Figure 2). An AVPTIF (Ala76-Phe81, in the THAP1 sequence) box, also essential for DNA binding, is located at the C-terminus of the THAP domain.

Legends of figure 2 and structural domains used in the database: