The Open Protein Structure Annotation Network
PDB Keyword


    Table of contents
    1. 1. Protein Summary
    2. 2. Ligand Summary

    Title Structure of Bacteroides thetaiotaomicron BT2081 at 2.05 A resolution: the first structural representative of a new protein family that may play a role in carbohydrate metabolism. Acta Crystallogr.,Sect.F 66 1287-1296 2010
    Site JCSG
    PDB Id 3hbz Target Id 392994
    Molecular Characteristics
    Source Bacteroides thetaiotaomicron vpi-5482
    Alias Ids TPS20279,NP_810994.1, 325007 Molecular Weight 37197.91 Da.
    Residues 341 Isoelectric Point 4.89
    Sequence reeapnaeadilscrlpgvvmttspiitnnsinifvgpgtdisslapeftltpgatidppsgtardfhs pqqytvtaadgfwkkkytvsvidtelatiynfedtlggqkyyifveregekvvmewasgnagyamtgvp ktaddyptfqfangktgkclslvtrstgffgsimgmpiaagnlfigsfdvgnamsnplkatkfglpfrh iptylagyykykagdqfteggkpvsgkrdicdiyaimyetsesvptldgtnaftspnlvsiariddake tdewtyfklpfhmlsgkyidkekltagkynvaivftsslegdhfngaigstllideveliyrsed
      BLAST   FFAS

    Structure Determination
    Method XRAY Chains 1
    Resolution (Å) 2.05 Rfree 0.191
    Matthews' coefficent 3.69 Rfactor 0.159
    Waters 253 Solvent Content 66.70

    Ligand Information


    Google Scholar output for 3hbz
    1. Structure of Bacteroides thetaiotaomicron BT2081 at 2.05 A resolution: the first structural representative of a new protein family that may play a role in carbohydrate
    AP Yeh, P Abdubek, T Astakhova - Section F: Structural , 2010 - scripts.iucr.org

    Protein Summary

    Gene BT_2081 translates into the NP_810994 protein, a 341 amino acid long polyppetide that is probably anchored in the outer membrane of Bacteroides thetaiotaomicron, a gram-negative, anaerobic bacteria that is a dominant member of the human distal gut microbiome.  This protein is a first solved structure from a new protein family of over 160 members, that so far has not been classified by Pfam. Members of this family are found mostly in Bacteroidetes, where they usually form large groups of paralogs - for instance there are at least 8 paralogs in Bacteroides thetaiotaomicron. Single homologs are also found in other groups of bacteria, for instance in actinobacteria (Streptomyces sviceus) or Gammaproteobacteria (Vibrio harveyi). Most of the homologs have the same length and domain structure as BT_2081, but some have more complex architecture with only one of the two domains of BT_2081 present in combination with additional domains. For instance the Streptomyces sviceus  endo-1,4-beta-xylanase contains two repeats of the Glycosyl hydrolases family 43 domain and the N-terminal domain of BT_2081.

    Similar domain combination is seen among some bacterial sialidases that contain an N-terminal  6-bladed beta-propeller followed by a linker Ig-like domain and a C-terminal Galactose-binding domain (See for example PDB 1w8o, 1euu, 2ber, 1w8n, 1wcq, 1eut). Interestingly the mutual orientation of the Ig-like and Galactose-binding domains are very different in BT_2081 as compared to the sialidases.    

    The structure of BT_2081 (3hbz), solved by Se-SAD to a resolution of 2.05 Angstroms, shows the protein to be a monomer consisting of two domains.  The N-terminal domain, which comprises residues 21-114 (N-terminal region encodes prokaryotic lipoprotein's signal peptide (or more likely, a single transmembrane helix anchoring the BT_2081 protein in the outer membrane) with predicted cleavage at position 19;  residues 1-20 were clipped off in the expression construct), consists of a beta-sandwich fold while the C-terminal domain, which comprises residues 115-361, also adopts a sandwich fold at its core, but with additional secondary structures at the periphery (Figures 1 a & b).
    Figure 1. Structure of BT_2081 (3hbz) monomer (a) rainbow color coded from N-terminus (blue) to C-terminus (red) and (b) showing the two distinct domains (N-terminal domain 1 in blue and C-terminal domain 2 in orange).  Calcium ions are represented as green spheres.

    (a)                                                                                                    (b)

     GS13416E_rainbow (1).png                 GS13416E_domains (1).png

    N-terminal Domain:

    A PSI-BLAST search of the full-length BT_2081 (NP_810994.1) sequence resulted in matches with almost exclusively hypothetical proteins, with several notable exceptions.  As discussed above,  endo-1,4-beta-xylanase (from Streptomyces sviceus) and beta-xylosidase (from Magnetospirillum magnetotacticum MS-1) contain regions homologous to the BT_2081 (NP_810994.1) N-terminal domain domain with 38% and 37% sequence identity, respectively, over alignment lengths of 93 and 72 residues (out of 114 residues).  Other proteins, for instance three putative lipoproteins (from Capnocytophaga sputigena, Bacteroides fragilis, and Parabacteroides distasonis) contain regions homologus to the N-terminal domain of BT_2081 (NP_810994.1) with  46%, 29%, and 25% sequence identity, respectively, over alignment lengths of 241, 224, and 228 residues (out of 247 residues).

    The N-terminal domain adopts a divergent form of a greek key beta sandwich fold.  For instance, the alignment to 1hdmA has an RMSD of 3A over 67 residues.

    N-terminal domain fold:  Immunoglobulin-like beta-sandwich


    C-terminal Domain:

    Statistically significant structural similarities were found for the C-terminal domain of BT_2081 (3hbz) by the Dali and FATCAT servers and are listed in Tables 1 and 2, respectively.  These hits mostly correspond to the carbohydrate binding domains of carboyhydrate enzymes such as glucanase and xylanase (e.g., 1gwk, 1wcu, 2zew, 1byh, 1v0a, 2zez, 1gmm,1w0n, 1w9s ), the ligand-binding domain of the ephrin receptor, which is a receptor tyrosine kinase receptor (e.g., 3fl7, 3ckh), and reelin, a neuronal regulatory protein (e.g., 2ddu and 2e26).

    Pre-SCOP classifies the C-terminal domain inside the Galactose-binding domain-like superfamily.


    Table 1.  Dali Structural Similarities of BT_2081 (3hbz) C-terminal Domain.

        No:  Chain   Z    rmsd lali nres  %id PDB  Description
    1: 2zew-B 9.5 3.2 129 147 18 PDB MOLECULE: S-LAYER ASSOCIATED MULTIDOMAIN ENDOGLUCANASE; 2: 2zez-C 9.1 3.4 128 143 19 PDB MOLECULE: S-LAYER ASSOCIATED MULTIDOMAIN ENDOGLUCANASE; 3: 2ddu-A 8.6 2.7 130 301 8 PDB MOLECULE: REELIN; 4: 2e26-A 8.6 3.3 130 705 12 PDB MOLECULE: REELIN; 5: 1byh 8.5 3.5 141 214 8 PDB HYBRID (1,3-1,4)-BETA-D-GLUCAN 4-GLUCANOHYDROLASE H (A16-M) 6: 1gbg 8.4 3.5 137 214 7 PDB MOLECULE: (1,3-1,4)-BETA-D-GLUCAN 4 GLUCANOHYDROLASE; 7: 3dgt-A 8.3 3.8 156 278 10 PDB MOLECULE: ENDO-1,3-BETA-GLUCANASE; 8: 3fl7-A 8.3 3.7 134 482 7 PDB MOLECULE: EPHRIN RECEPTOR; 9: 2e31-A 8.3 2.8 125 246 5 PDB MOLECULE: F-BOX ONLY PROTEIN 2; 10: 3ckh-A 8.2 3.7 137 170 7 PDB MOLECULE: EPHRIN TYPE-A RECEPTOR 4;


    Table 2.  FATCAT Structural Similarities of 3hbz (C-terminal domain).

    The list of similar structures with probability < 5.00e-02
    no structure code length score P-value twist opt-len opt-rmsd chain-rmsd align-len gap seq-ide(%) align
    1 1gwkA 1gwkA 141 136.65 1.73e-03 0 128 3.08 3.97 236 108 6.78 view
    2 1wcuA 1wcuA 149 127.99 2.95e-03 0 129 3.01 3.23 236 107 4.66 view
    3 2zewA 2zewA 146 117.40 4.43e-03 0 136 3.01 2.90 240 104 10.42 view
    4 3ckhA 3ckhA 170 122.54 5.73e-03 0 129 3.07 4.58 253 124 5.14 view
    5 1byhA 1byhA 214 125.34 6.26e-03 0 138 3.05 3.56 279 141 5.02 view
    6 1v0aA 1v0aA 170 119.26 6.65e-03 0 130 3.05 3.25 244 114 9.84 view
    7 2zezA 2zezA 142 111.72 7.66e-03 0 134 3.16 2.75 240 106 9.17 view
    8 1gmmA 1gmmA 126 114.00 8.22e-03 0 115 3.05 2.78 237 122 4.64 view
    9 1w0nA 1w0nA 120 100.01 1.96e-02 0 114 3.10 3.43 234 120 5.13 view
    10 1w9sA 1w9sA 134 97.81 2.01e-02 0 122 3.04 3.98 245 123 3.67 view


    When the top 5 structurally similar proteins from FATCAT are superimposed onto the C-terminal domain of 3hbz (Figure 2), the structures are most similar in the central two-sheet beta barrel region, which shows significant similarity to the concavalin-like lectin/glucanases fold (SCOP fold b.29), and also to the carbohydrate binding module of various glucanases (SCOP fold b.18).

    The most notable difference between the structure of the 3hbz C-terminal domain and the other structures is that 3hbz contains short helices at the periphery of the protein whereas the other structures do not (Figure 2).

    Figure 2.  Superposition of the C-terminal domain of 3hbz (orange) with 1gwk (red), 1wcu (green), 2zew (cyan), 3ckh (purple), and 1byh (yellow).

    GS13416E_superimpose (1).png

    Conserved Calcium:

    A comparison of the C-terminal domain of 3hbz with structures 1ux7 (same as 1w0n, except bound with sugar), 2zex, and 2zey (same as 2zew and 2zez, except bound with sugars) reveals that there is a conserved calcium binding site (Figure 3).  The calcium in 3hbz is octahedrally coordinated by Wat-30, the carbonyl oxygens of Asn-120, Thr-174, Lys-176, and the carboxylate oxygens of the sidechains of Glu-122 and Asp-351.  Of these residues, Glu-122, Asp-351 are highly, and Asn-120 Lys-176 to a lesser extent, conserved among sequence homologs of 3hbz.  It is thought that in the other structures, the calcium plays a structural role, contributing greater stability to the protein fold.  Perhaps it is playing the same role in NP_810994.1.

    Figure 3.  Superposition of 3hbz (yellow) with 1ux7 (magenta), 2zex (violet), 2zey (cyan) shows a conserved calcium binding site, which probably plays a structural role.  Calciums are shown as spheres.




    Further comparison of 3hbz with the above structures (i.e., 1ux7, 2zex, and 2zey) at their carbohydrate binding pockets reveals that 3hbz does not contain the same residues at the corresponding sites.  Furthermore, in the case of 2zex and 2zey, where the carbohydrates are bound in a cleft, in 3hbz, this region is occluded by the additional helices and loops.


    Conserved Residues:

    Analysis by Consurf reveals two main areas of high residue conservation on the surface of 3hbz, one in the N-terminal domain (Figure 4a), and another in the C-terminal domain (Figure 4b).  Residues that are highly conserved in the N-terminal domain include Glu-23, Ala-24, Asn-26, Ala-27, Glu-28, Ala-29, Asp-30, Ile-31, Phe-69, Ile-77, Thr-96, and Ala-98 while those that are highly conserved in the C-terminal domain include Lys-237, Lys-239, Ala-240, Gly-241, Glu-296, Trp-300, Gly-345, and Thr-347.

    Figure 4.  Surface representation of conserved (red) and non-conserved (aqua) residues of 3hbz.  (a) view showing area of high conservation in the N-terminal domain and (b) view rotated ~180 degrees about the vertical axis from (a) showing area of high conservation in the C-terminal domain.

    (a)                                                                                                (b)





    Electrostatic Potential Surface:

    An electrostatic potential map of the protein shows a prominent negatively charged area on one side of the C-terminal domain (Figure 5).  This area does not overlap much with the highly conserved areas shown in Figure 4.  Although speculative, it is conceivable that this area may be a docking surface for other protein(s) or for ligand(s).

    Figure 5.  Electrostatic potential map of 3hbz, showing a prominent negatively charged patch.

    GS13416E_epotential (1).png

    Interestingly, PEG molecules from the crystallization conditions encircle the N-zeta of several lysines, including Lys-220, Lys-304, and Lys-324 (Figure 6).
    Figure 6.  PEG molecule surrounding Lys-304.  Several instances of PEG molecules surrounding lysine residues occur in the structure.


    Two cacodylate ions and two calcium ions from the crystallization conditions were modeled into the 3hbz tructure based on anomalous difference density calculated from data collected near the arsenic K-edge.



    Molecular basis for the selectivity and specificity of ligand recognition by the family 16 carbohydrate-binding modules from Thermoanaerobacterium polysaccharolyticum ManA.  Bae B, Ohene-Adjei S, Kocherginskaya S, Mackie RI, Spies MA, Cann IK, Nair SK.  J Biol Chem. 2008 May 2;283(18):12415-25. Epub 2007 Nov 19.

    Ab initio structure determination and functional characterization of CBM36; a new family of calcium-dependent carbohydrate binding modules.  Jamal-Talabani S, Boraston AB, Turkenburg JP, Tarbouriech N, Ducros VM, Davies GJ.

    Structure. 2004 Jul;12(7):1177-87.

    The location of the ligand-binding site of carbohydrate-binding modules that have evolved from a common sequence is not conserved.  Czjzek M, Bolam DN, Mosbah A, Allouch J, Fontes CM, Ferreira LM, Bornet O, Zamboni V, Darbon H, Smith NL, Black GW, Henrissat B, Gilbert HJ.  J Biol Chem. 2001 Dec 21;276(51):48580-7. Epub 2001 Oct 22.

    Ligand Summary




    No references found.

    Tag page
    • No tags
    You must login to post a comment.
    All content on this site is licensed under a Creative Commons Attribution 3.0 License
    Powered by MindTouch