feature) and subtracted the fifth field (stop) from fourth (start). To calculate gene length distribution: I parsed the GTF file for "genes" (third field i.e. Methodology (so that limitations can be identified) See What's the longest transcript known? for more details. It seems that it is the longest gene in many other diverse animals. Remi has user19099 have mentioned that the longest gene in humans is titin. I made a histogram plot of these lengths for convenience: Transcript length distribution The genes would be longer than (or equal to) their corresponding transcripts because the latter gets shortened due to splicing. The average gene length seems to be around: 29kbp The average transcript length seems to be around: 1.5kb I just did a rough calculation from the GENCODE human genome annotation file (version 23). This is not based on gene prediction it is manually annotated by the HAVANA team. However, the smallest annotated gene from the GENCODE annotations is TRDD1 (just 7nt long!!!). As indicated in the comments, the smallest gene may be the tRNA. There may technically be a minimum cutoff on gene length which could be the length of DNA necessary for the RNA-polymerase to sit and also include the termination signals. These are not really considered genes as they are heterogeneous in size and are not marked by any boundary. There are some little RNAs (~18nt) that are produced from TSS of usual genes but are probably products of failed elongation. Typically a gene should have a transcription start site dictated by a promoter and a transcription stop site marked by termination signals (like terminators and poly-A signal etc.) A gene is a region of the DNA that is transcribed. Is there an agreed-upon definition as to how many nucleobases constitute a gene? If not, why not?
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |