July 5, 2006Open Access

Analysis of High Throughput Protein Expression in Escherichia coli

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

The ability to efficiently produce hundreds of proteins in parallel is the most basic requirement of many aspects of proteomics. Overcoming the technical and financial barriers associated with high throughput protein production is essential for the development of an experimental platform to query and browse the protein content of a cell (e.g. protein and antibody arrays). Proteins are inherently different one from another in their physicochemical properties; therefore, no single protocol can be expected to successfully express most of the proteins. Instead of optimizing a protocol to express a specific protein, we used sequence analysis tools to estimate the probability of a specific protein to be expressed successfully using a given protocol, thereby avoiding a priori proteins with a low success probability. A set of 547 proteins, to be used for antibody production and selection, was expressed in Escherichia coli using a high throughput protein production pipeline. Protein properties derived from sequence alone were correlated to successful expression, and general guidelines are given to increase the efficiency of similar pipelines. A second set of 68 proteins was expressed to investigate the link between successful protein expression and inclusion body formation. More proteins were expressed in inclusion bodies; however, the formation of inclusion bodies was not a requirement for successful expression. The ability to efficiently produce hundreds of proteins in parallel is the most basic requirement of many aspects of proteomics. Overcoming the technical and financial barriers associated with high throughput protein production is essential for the development of an experimental platform to query and browse the protein content of a cell (e.g. protein and antibody arrays). Proteins are inherently different one from another in their physicochemical properties; therefore, no single protocol can be expected to successfully express most of the proteins. Instead of optimizing a protocol to express a specific protein, we used sequence analysis tools to estimate the probability of a specific protein to be expressed successfully using a given protocol, thereby avoiding a priori proteins with a low success probability. A set of 547 proteins, to be used for antibody production and selection, was expressed in Escherichia coli using a high throughput protein production pipeline. Protein properties derived from sequence alone were correlated to successful expression, and general guidelines are given to increase the efficiency of similar pipelines. A second set of 68 proteins was expressed to investigate the link between successful protein expression and inclusion body formation. More proteins were expressed in inclusion bodies; however, the formation of inclusion bodies was not a requirement for successful expression. The completion of the human genome project and the biotechnical advances in the field of genomics have radically transformed biological and medical research. We now have the ability to monitor the mRNA expression of thousands of genes simultaneously in cells and tissues. However, it is the proteins encoded by these genes that carry out most biological functions. The proteome is much more daunting in size and complexity than the genome, and to understand how cells work we must study which proteins are present, how they interact with each other, and what they do. The difficulty of studying proteins is that they are each distinctively different from the other and are usually present in tissue in very low amounts. In the absence of a PCR equivalent, it has been suggested to call upon affinity ligands, such as monoclonal antibodies, for detection and identification of proteins (1Humphery-Smith I. A human proteome project with a beginning and an end.Proteomics. 2004; 4: 2519-2521Crossref PubMed Scopus (38) Google Scholar). Regardless of the specific affinity ligand used, purified proteins must first be acquired in large quantities for generation and/or selection of specific affinity ligands. Thus, there is a need to define expression and purification conditions that are amenable to hundreds or even thousands of proteins in parallel. However, because proteins differ significantly in their physicochemical properties, the success rate of high throughput protein production is often too low, increasing the financial and technical constraints on such projects. Several groups have previously attempted high throughput expression of proteins or protein fragments. High throughput is defined as the ability to automate protein production, often using a 96-well format. Braun et al. (2Braun P. Hu Y. Shen B. Halleck A. Koundinya M. Harlow E. LaBaer J. Proteome-scale purification of human proteins from bacteria.Proc. Natl. Acad. Sci. U. S. A. 2002; 99: 2654-2659Crossref PubMed Scopus (218) Google Scholar) expressed 336 randomly selected human cDNAs in Escherichia coli and purified successfully 60% under denaturing conditions using His6 constructs and 50% under non-denaturing conditions using GST constructs. Luan et al. (3Luan C.H. Qiu S. Finley J.B. Carson M. Gray R.J. Huang W. Johnson D. Tsao J. Reboul J. Vaglio P. Hill D.E. Vidal M. Delucas L.J. Luo M. High-throughput expression of C. elegans proteins.Genome Res. 2004; 14: 2102-2110Crossref PubMed Scopus (96) Google Scholar) expressed 10,176 Caenorhabditis elegans proteins using a robotic pipeline and observed an overall expression of 50% (15% in soluble form). Agaton et al. (4Agaton C. Galli J. Höidén Guthenberg I. Janzon L. Hansson M. Asplund A. Brundell E. Lindberg S. Ruthberg I. Wester K. Wurtz D. Höög C. Lundeberg J. Ståhl S. Pontén F. Uhlén M. Affinity proteomics for systematic protein profiling of chromosome 21 gene products in human tissues.Mol. Cell. Proteomics. 2003; 2: 405-414Abstract Full Text Full Text PDF PubMed Scopus (105) Google Scholar) reported a success rate of 76% for the expression of 142 human proteins in E. coli. Other groups reported success rates in the range of 60–80% (5Christendat D. Yee A. Dharamsi A. Kluger Y. Gerstein M. Arrowsmith C.H. Edwards A.M. Structural proteomics: prospects for high throughput sample preparation.Prog. Biophys. Mol. Biol. 2000; 73: 339-345Crossref PubMed Scopus (68) Google Scholar, 6Pizza M. Scarlato V. Masignani V. Giuliani M.M. Aricò B. Comanducci M. Jennings G.T. Baldi L. Bartolini E. Capecchi B. Galeotti C.L. Luzzi E. Manetti E. M. S. L. S. M. E. P. M. E. B. E. of by 2000; PubMed Scopus Google Scholar, E. A. Arrowsmith C.H. Edwards A.M. High-throughput production of PubMed Scopus Google Scholar). The of a protein can often by with a protein of A to selection for of Res. PubMed Scopus Google Scholar, of the of a a of Natl. Acad. Sci. U. S. A. PubMed Scopus Google Scholar). Structural proteomics to protein on a not high throughput expression of proteins that the proteins be in a that is and for or to produce proteins on a large for in success rates of P. Kluger Y. D. D. Yee A. Edwards A.M. Arrowsmith C.H. G.T. Gerstein M. an and for in Res. PubMed Scopus Google Scholar, D. P. D. G.T. Gerstein M. 2: a for proteomics a Res. 2003; PubMed Scopus Google Scholar). low success rate that attempted to link the sequence of a protein to to be soluble upon in E. coli P. Kluger Y. D. D. Yee A. Edwards A.M. Arrowsmith C.H. G.T. Gerstein M. an and for in Res. PubMed Scopus Google Scholar, D. P. D. G.T. Gerstein M. 2: a for proteomics a Res. 2003; PubMed Scopus Google Scholar, S. the between the of proteins and to be soluble on in Escherichia Sci. 14: PubMed Scopus Google Scholar, K. M. M. of of proteins on their PubMed Scopus Google Scholar). the other protein production for affinity not the protein to be Agaton et al. (4Agaton C. Galli J. Höidén Guthenberg I. Janzon L. Hansson M. Asplund A. Brundell E. Lindberg S. Ruthberg I. Wester K. Wurtz D. Höög C. Lundeberg J. Ståhl S. Pontén F. Uhlén M. Affinity proteomics for systematic protein profiling of chromosome 21 gene products in human tissues.Mol. Cell. Proteomics. 2003; 2: 405-414Abstract Full Text Full Text PDF PubMed Scopus (105) Google Scholar) reported a success rate of for proteins that were expressed in E. coli and purified under denaturing In protein production for affinity is significantly than production for with the financial constraints of high throughput protein production, it be to a priori proteins that are to expression in a pipeline for affinity ligand of protein upon has of successful expression has been of protein expression is to be more because expression can in of different from to the purified of such as mRNA are not to the protein sequence or to the physicochemical properties of the on the other is more to be on the of the In study we present on the expression of 547 proteins, as for affinity ligand and investigate the link between their and protein and successful expression. we investigate the between and expression on a set of 68 human proteins. We randomly selected human 547 for high throughput expression and 68 for inclusion body from genes in in and their sequence from were with the human genome using and the were and set the first was to the of a and from the the was selection and PCR for protein production pipeline have been previously Y. I. content of as a of PCR Res. 2003; PubMed Google Scholar). The used, a His6 a a and a The is the of the of protein A of B. B. L. A. E. C. Uhlén M. A on protein PubMed Scopus Google Scholar). The A. of the and for detection and purification of 2000; PubMed Google Scholar) was using gene and and have been previously by Y. Y. M. B. P. W. using as PubMed Scopus Google Scholar). expression the E. coli was cells of the and to expression of genes by or Protein purification for the high throughput protein pipeline was under denaturing The were in of with and the of the were for The were and the were each in of one of The content of each was for with in the were for The in the protein purification protocol were using the of of the were to a 96-well of used under the Protein or analysis of that was with of the of was for The was with and of in each was for and the was of was to each was for and the was in a 96-well Protein purification for inclusion body analysis was for the soluble and of the The from of was in of Protein one of for and for in a and The was and on a of were with of and and with of The of the the was by in of and for The was on of and with and of in Proteins were with of proteins were on with A sequence analysis was in for study and is as of the of B. J. tools for 2000; Google Scholar). was to and C. and are the of in Escherichia coli Res. PubMed Scopus Google and a protein was to et al. K. between of a protein and a for in of a protein from 4: PubMed Scopus Google Scholar). and content and were using of soluble proteins with 2000; Scholar) from the P. I. A. The 2000; Full Text Full Text PDF PubMed Scopus Google Scholar). and protein were to et al. M. E. P. of protein PubMed Scopus Google Scholar). Protein was using J. I. a to a given protein sequence is PubMed Scopus Google and from the the and the of in were sequence complexity was using S. of in sequence PubMed Google Scholar) and A complexity for biological and 2000; PubMed Scopus Google Scholar). Protein was using J. B. of the and of for the of Mol. Biol. PubMed Scopus Google Scholar) from the and the of and were from the The is to have a low in the of a however, we are not in the by in the overall and of the protein on the of to be present in one or of mRNA was using M. for and Res. 2003; PubMed Scopus Google and the most was selected Protein low complexity was using a for low complexity proteins and protein PubMed Scopus Google Scholar). content was as previously Y. I. content of as a of PCR Res. 2003; PubMed Google Scholar) The was to and J. A for the of a Mol. Biol. PubMed Scopus Google Scholar). The an for the protein, which in many be on a protein be have a large we and the protein using the and with a of The under the and the were using the as in for The was and the was The single or were and and The and were by the by the of in the In the to was the of the of and the of was to and The of and Res. PubMed Scopus Google Scholar). A set of expressed E. coli proteins were selected from selected proteins were on a protein and were present in large with as using the of set of proteins is upon The was using a and the for each gene was were using a that was using a of The a and the was for the of and the single were used for the A of the was by to of a protein with an content different from the E. coli is on the that a mRNA for a that is because of the than of protein C. J. of protein PubMed Scopus Google Scholar). In other the even an not is to be the The content of E. coli was using the set of expressed E. coli proteins The content of each sequence and the from the protein content were each that was used more than the of to specific was and the used to the was a specific protein and the E. coli protein the most was from to and other were The probability of a was by to the increase in of in the the were to the the that were for were Protein was using or The and using probability 2002; PubMed Scopus Google a of tools that the to were for each protein sequence the E. coli and human to a sequence of The E. coli proteome was using E. coli genome The human proteome was from the Protein J. A. Y. E. The Protein an for proteomics 2004; 4: PubMed Scopus Google Scholar). proteins with more than and to another in the were using L. C. from large protein sequence 14: PubMed Scopus Google Scholar). and were using the and are that are often used in we used to proteins expression groups on and protein sequence in the of and the expression were previously to similar sequence to successful expression for analysis P. Kluger Y. D. D. Yee A. Edwards A.M. Arrowsmith C.H. G.T. Gerstein M. an and for in Res. PubMed Scopus Google Scholar, D. P. D. G.T. Gerstein M. 2: a for proteomics a Res. 2003; PubMed Scopus Google Scholar). were using the of the A and for for Scholar). The were using a complexity of the of a of randomly selected proteins were selected from the in to the The was set of 547 human each a different were using high throughput the The protein were a of the The of the protein was to The of the protein on the and on the was in with a of The the were to be by the expected on The proteins were one of no with with with and with size was on a Gray were and were In of the proteins a was on the and overall in of the proteins the expected size was In a protein was on the the was or to the of the of the and of to that were significantly or in each of proteins with E. coli and human were protein is to the is to which The set of human proteins expressed that were in and with the human protein The are and that were with the human protein were in and have a to be in or protein and are not and not A more was groups with the E. coli protein was a of in and are and and to in or protein that were were in and are not and not and in protein the is in with the E. coli protein to with another a that in the E. coli of is usually associated with to express proteins because they In set of human proteins, no expression were observed for proteins in and in each expression to the E. coli protein, the human protein, and to the protein in a proteins with high expression and expected was the most for pipeline. other groups were with it by using an sample with for sequence analysis by groups and of and protein of the expression in a and than the other to a of in protein mRNA to be similar for was not significantly different the protein groups for which a than the under a content was not significantly different the was the with a significantly with However, there was no in the of the protein and the significantly different from was using the groups an and was significantly different from with a The analysis of the under the that significantly more with and that groups and significantly with and a of to in the was and the and of and were significantly different from and a protein to the and the The content as by was significantly different from for I. more and and and are in with the in and The content of and from the was not significantly different the groups for a content in with Protein sequence complexity was significantly for for protein complexity and the low complexity sequence in the The was than that for Protein was using the significantly than the groups and a significantly than were used to that were or in each with The most were the low complexity of in the high complexity of in groups and and most of the in groups were in the a were In general the analysis was in with the sequence for in were in and the that is by a were used to sequence that were the most for In each one of the protein groups is with The and the were the most for of proteins or The of the proteins, in and in for other groups were not in the sequence used, and the was than of I. that in more than one for for and protein and for The that were different between groups were previously associated with inclusion body formation S. the between the of proteins and to be soluble on in Escherichia Sci. 14: PubMed Scopus Google Scholar, F. M. protein and in Escherichia 2004; PubMed Scopus Google Scholar, S. of protein tools to increase protein 4: PubMed Scopus Google Scholar). 68 human were expressed using the for the protein purification Instead of the protein under denaturing proteins were purified from the soluble and The purified proteins were on and one of groups as The of the proteins from the were as and the of the proteins from the soluble were as and of the proteins that were purified from the were observed to be of expected size with in the soluble the of proteins with no on was for the soluble no protein were observed on for proteins, and a of expected was observed for proteins were present in the soluble and were present in the and were present in the soluble Proteins that were expressed in the soluble were with proteins that were expressed in the The that were significantly different were content and Proteins that were expressed in the soluble a content and a of expressed proteins from the soluble and Protein purification was from the soluble and of the Proteins were on and one of no or for these proteins was The was to and The for 68 proteins was to the of was different expression groups and groups A protein was the protein was of the expected size in the soluble or A protein was no was observed on in the soluble and for was significantly than the observed for groups which was significantly than that for groups The field of genomics has been by the ability to high throughput are now to most and than it can be The field of many and high throughput The work the technical in protein expression. We expressed protein in E. coli for the were expressed as of a protein with a large protein on the and a on the the selected was on a of the were observed in expression in E. coli. The success rate reported of is similar to a pipeline was (2Braun P. Hu Y. Shen B. Halleck A. Koundinya M. Harlow E. LaBaer J. Proteome-scale purification of human proteins from bacteria.Proc. Natl. Acad. Sci. U. S. A. 2002; 99: 2654-2659Crossref PubMed Scopus (218) Google Scholar, C.H. Qiu S. Finley J.B. Carson M. Gray R.J. Huang W. Johnson D. Tsao J. Reboul J. Vaglio P. Hill D.E. Vidal M. Delucas L.J. Luo M. High-throughput expression of C. elegans proteins.Genome Res. 2004; 14: 2102-2110Crossref PubMed Scopus (96) Google Scholar, C. Galli J. Höidén Guthenberg I. Janzon L. Hansson M. Asplund A. Brundell E. Lindberg S. Ruthberg I. Wester K. Wurtz D. Höög C. Lundeberg J. Ståhl S. Pontén F. Uhlén M. Affinity proteomics for systematic protein profiling of chromosome 21 gene products in human tissues.Mol. Cell. Proteomics. 2003; 2: 405-414Abstract Full Text Full Text PDF PubMed Scopus (105) Google Scholar, D. Yee A. Dharamsi A. Kluger Y. Gerstein M. Arrowsmith C.H. Edwards A.M. Structural proteomics: prospects for high throughput sample preparation.Prog. Biophys. Mol. Biol. 2000; 73: 339-345Crossref PubMed Scopus (68) Google Scholar, 6Pizza M. Scarlato V. Masignani V. Giuliani M.M. Aricò B. Comanducci M. Jennings G.T. Baldi L. Bartolini E. Capecchi B. Galeotti C.L. Luzzi E. Manetti E. M. S. L. S. M. E. P. M. E. B. E. of by 2000; PubMed Scopus Google Scholar, E. A. Arrowsmith C.H. Edwards A.M. High-throughput production of PubMed Scopus Google Scholar). protein that were observed on were to or than the that is very and We have that is for Y. Y. M. B. P. W. using as PubMed Scopus Google Scholar). we attempted to on sequence proteins that are for specific pipeline We observed between the expression The that is most different from is which the genes products to produce on the The proteins by were the most and the low complexity and the protein and have been previously in the formation of inclusion bodies S. the between the of proteins and to be soluble on in Escherichia Sci. 14: PubMed Scopus Google Scholar, S. of protein tools to increase protein 4: PubMed Scopus Google Scholar). However, the low and low protein in to inclusion body formation. bodies have been to be the of an of F. M. protein and in Escherichia 2004; PubMed Scopus Google and and are to the formation of such S. of protein tools to increase protein 4: PubMed Scopus Google Scholar). there is a between the high observed in which in an and the large which it is that the of high and with low and low protein a protein that is not to inclusion bodies in the a protein is to the because large have a to to other proteins in an is by the rate observed for I. is that cells that expressed the Proteins of a significantly protein complexity and a protein complexity is to and a in and protein and have in Escherichia coli. for protein PubMed Scopus Google Scholar, and that and by avoiding PubMed Scopus Google Scholar, of and protein from protein in E. coli PubMed Scopus Google Scholar). and and that and by avoiding PubMed Scopus Google Scholar) that the of to the and in a increase in The complexity and for proteins in that proteins in a on the is another of and the The from by the single and the of the protein, the that these proteins were most expressed detection to to the The groups of proteins with expected and were one from the other using sequence the high of proteins in each the to proteins in one of the groups were and the difficulty of these low quantities of purified protein can be by protein production, or low the was to be it is that the His6 was for is that was for because no was to it in the other set of 68 proteins was analysis in a were in groups with the E. coli however, in the was even than in The that soluble proteins were present in low the proteins to amounts. in is most to in of the protein or in to Proteins that a on of size and were as proteins that were or or the the proteins were to from than proteins from I. The most of proteins with size is their with the of proteins of expected size and the proteins have to proteins were However, that were of in and properties it is to that the for production of proteins with size was or of the proteins and not Protein expression can in many different from the of the the protein to the specific protein and with the E. coli proteins. a pipeline we were not to expression or protein the was of the that were derived from sequence analysis were significantly different between it is more that in expression were to the properties of the protein, the and mRNA constructs were for A or protein sequence is often using a by a to each or E. C. A. S. A. The Google such as the content in these be a they are to hundreds of because there is no to a the we used the under the and a or the the and under a on the of the was more than using the of the and the between protein the as by and J. A for the of a Mol. Biol. PubMed Scopus Google for suggested that the of was significantly different from and that protein groups were However, using the of the and in each protein and the of the of and the significantly content in groups and with and the of in I. that the under the were to be more used in the with that we using these sequence analysis or protein body formation was on 68 proteins. more protein be from proteins that inclusion bodies with soluble proteins. The of the proteins were present in the soluble and however, were present in one of the The expression in the soluble was to the and of the which is the of the the of proteins with expected size in the many proteins be expressed in a soluble inclusion body formation as such is not essential for expression. was significantly for soluble proteins. were previously to increase the of inclusion body formation S. the between the of proteins and to be soluble on in Escherichia Sci. 14: PubMed Scopus Google Scholar, S. of protein tools to increase protein 4: PubMed Scopus Google Scholar). The biological of the of content observed in soluble proteins is not The ability to successful expression was which have been used previously for similar P. Kluger Y. D. D. Yee A. Edwards A.M. Arrowsmith C.H. G.T. Gerstein M. an and for in Res. PubMed Scopus Google Scholar, D. P. D. G.T. Gerstein M. 2: a for proteomics a Res. 2003; PubMed Scopus Google were not groups and in the were used in in many different were used for in each the difficulty in a set of that expression and that can different and to a different of we are to produce an for successful expression with a and However, for specific pipeline efficiency is to be by avoiding proteins with a or a of to different from an or high or protein complexity high or and low Other be to proteins for which specific need to be groups the and to produce proteins in a high throughput with success rates of The success rate is often as is with no the for which protein expression for such a large of proteins. We to groups using high throughput protein expression to investigate the link between and protein and successful expression, thereby to and protein expression that are essential for research.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo