Blast against Tobacco
 
Download TOBFAC sequences (fasta)
 
Download TOBFAC sequences (long header, fasta)
 
ESTobacco BLAST results
 
GenBank protein BLAST results
 
List of papers
 
See all search queries
 
All search queries (fasta)
 
Published genes
 
TOBFAC sequences with GenBank acc (.csv tab separated)
 
Links to other TF databases
 

TOBFAC came into being as a database of tobacco transcription factors at the time, possibly the largest collection of transcription factor sequences from a single plant species (over 2,500 genes). We have now expanded TOBFAC with the goal of making it the best database for tobacco genomic research. To do this, we have incorporated a large amount of new data that can be searched and assembled. For the first time, it is possible to search:

1) 1,159,022 gene-space sequence reads (GSRs) obtained by methylation filtering from the Tobacco Genome Initiative (TGI).
2) The DFCI Tobacco Gene Index (Release 4.0 July 5, 2008) that contains 163,524 tobacco EST sequences and 2,288 expressed transcripts (ETs).
3) The complete TOBFAC database of tobacco transcription factors.

It is also possible to search multiple libraries in a single search. We have incorporated tools for downloading all of the sequences from the blast results and also a contig tool to assemble any or all of the resulting sequences.

These refinements to TOBFAC bring together at least 1,327,716 individual sequences from either tobacco genomic DNA or cDNA, and TOBFAC now represents the tobacco genomic database that the tobacco community requires, but that has been lacking.

We are also improving the TOBFAC sequences by extending the original contigs using a contig extension tool designed by Ryan Thompson. This has allowed us to refine the predicted genes. These will be updated on a gene family basis as the improved data become available.

Publications

Paul J. Rushton, Marta T. Bokowiec, Shengcheng Han, Hongbo Zhang, Jennifer F. Brannock, Xianfeng Chen, Thomas W. Laudeman, and Michael P. Timko
Tobacco Transcription Factors: Novel Insights into Transcriptional Regulation in the Solanaceae
Plant Physiol. Published on March 12, 2008 10.1104/pp.107.114041

Rushton PJ, Bokowiec MT, Laudeman TW, Brannock JF, Chen X, Timko MP.
TOBFAC: the database of tobacco transcription factors
BMC Bioinformatics. 2008 Jan 25;9:53.

Introduction

Regulation of gene expression at the level of transcription is a major control point in many biological processes and plant genomes devote approximately 7% of their coding sequence to transcription factors. Global analysis of transcription factors has only been performed for three seed plants - Arabidopsis (http://datf.cbi.pku.edu.cn/index.php) , poplar (http://dptf.cbi.pku.edu.cn/) and rice (http://drtf.cbi.pku.edu.cn/) . TOBFAC: The database of tobacco transcription factors, contains a detailed analysis of over 2,513 tobacco (Nicotiana tabacum L.) transcription factors using a dataset of 1,159,022 gene-space sequence reads (GSRs) obtained by methylation filtering from the Tobacco Genome Initiative (TGI). These GSRs are estimated to represent at least 90% of tobacco open reading frames.

TOBFAC contains all of the transcription factor sequences from the TGI, together with EST data. These sequences can be queried by BLAST searches and downloaded for further analysis. TOBFAC also contains phylogenetic trees for some of the largest families of transcription factors and these are also downloadable. We aim to regularly update the information so that TOBFAC will continue to represent one of the most wide-ranging databases of transcription factors in any plant species and be a major resource for the study of gene expression in tobacco and the Solanaceae.

Available families (minimum genes)
ABI (76) Alfin (9) AP2 (35) ARF (12) ARID (8)
AS2 (75) AUX-IAA (35) BBR-BPC (3) BES (19) bHLH (190)
bZIP (75) C2C2-GATA (28) C2H2 (161) C3H (69) CAMTA (6)
CCAAT-Dr1 (3) CCAAT-HAP2 (12) CCAAT-HAP3 (15) CCAAT-HAP5 (6) CONSTANS (40)
CPP (3) Dof (46) E2F (6) EIL (6) ERF (239)
FHA (12) GARP-ARR-B (11) GARP-G2 (64) GeBP (15) GIF (4)
GRAS (45) GRF (23) HMG (9) Homeodomain (129) HRT (2)
HSF (34) JUMONJI (18) LFY (2) LIM (22) LUG (4)
MADS (119) MBF (5) MYB-related (56) NAC (152) Nin (11)
NZZ (2) PcG (20) PHD (59) PLATZ (15) R2R3-MYB (194)
S1Fa (1) SAP (4) SBP (27) SRS (12) TAZ (8)
TCP (43) Trihelix (40) TULP (13) ULT (4) VOZ (2)
Whirly (2) WRKY (93) YABBY (10) ZF-HD (38) ZIM (13)

Tobacco (Nicotiana tabacum L.) has been one of the most studied plant species, partly because of its economic importance and partly because it is a convenient plant system for research. Tobacco is a model plant for the Solanaceae and is an amphiploid species (2n=48) with a relatively large genome size of approximately 4.5 Mbp and this large genome size makes the goal of sequencing the tobacco genome difficult. However, to alleviate some of the difficulties created by the presence of large amounts of repetitive DNA in large genomes, a number of techniques have been developed to isolate the low-copy or hypomethylated regions of the genome for sequencing. One of these techniques is methylation filtration (MF), which preferentially clones the hypomethylated fraction of the genome, effectively reducing the size of the genome to be sequenced. The Tobacco Genome Initiative (TGI) (http://www.tobaccogenome.org/) has been established to sequence and annotate more than 90% of the open reading frames in the genome of cultivated tobacco using methylation filtration technology.


We used a dataset of 1,159,022 gene-space sequences reads (GSRs) obtained by methylation filtering from the Tobacco Genome Initiative (TGI) to obtain sequence information from at least 90% of tobacco transcription factors. A consensus amino acid sequence (normally the DNA-binding domain) from each of 64 currently known transcription factor families was used to isolate sequences that belong to each class of transcription factor. These were assembled into contigs and individually analysed by BLAST searches to verify the identity of the gene sequence. Tobacco contains a minimum of 2,513 transcription factors, a total that is higher than both Arabidopsis and rice. Arabidopsis, poplar and tobacco all contain this core set of 64 transcription factor families and that rice also shares 63 of these. This suggests that the evolution of higher plants was not associated with the wholesale gain or loss of transcription factor families but rather with the lineage specific expansion of transcription factor subfamilies. Highlights of our work include the discovery of a novel subfamily of NAC transcription factors that we have called TNACS. The TNAC genes make up about 25% of all NAC genes in tobacco but are completely absent from all currently sequenced plant genomes. TNACs are, however, present in tomato, pepper and potato and this novel subfamily therefore appears to be restricted to the Solanaceae. In addition, we have subjected the tobacco ERF, WRKY, NAC, homeodomain, bZIP, bHLH, R2R3MYB and MADS box genes to detailed phylogenetic analysis that facilitates predictions of function based on phylogenetic position.

The table below lists over- and under-represented TF families compared to the three sequenced higher plant genomes.

TF Family

Arabidopsis

Poplar

Rice (indica)

Rice (japonica)

Tobacco







ERF/AP2

146

212

174

182

274

C2H2

134

81

94

113

161

HD

87

106

84

103

129

TCP

23

34

22

24

43

ZF-HD

16

25

14

15

38

GRF

9

9

12

18

23

BES

8

12

7

6

19

SAP

1

1

0

0

4







PcG

34

45

34

34

23

ZIM

18

22

19

29

13

ARF

23

37

24

41

12

CCAAT HAP5

13

19

14

18

6

CPP

8

13

11

16

3

Authors of this site:

Paul J Rushton
Marta T. Bokowiec
Xianfeng (Jeff) Chen
Thomas (Tom) W Laudeman
Jennifer F. Brannock
Michael P. Timko

Contact:

pr8y@virginia.edu