
Sarrita A. answered 05/03/20
University of Cambridge Graduate with PhD in Biochemistry
I am not sure your approach is particularly necessary, you can determine the amount of homology between genes at the nucleotide level, you do not need a protein intermediate step.
You can get the name of the genes from entering the Gene ID and then you can ascertain the family of proteins under study.
Though if you wish to use your method and you do not know how to right a program in R, or python, you can:
- Get the FASTA sequence by searching the Entrez Gene ID (see below.)
- Use BLASTx to go from nucleotide to protein, just type in Entrez Gene.
- Then you can run protein BLAST or just compare the output you get from BLASTx.
- To get the FASTA sequence go to: https://www.ncbi.nlm.nih.gov/gene/
- Type in the Entrez Gene ID and select search - in this case 395552 https://www.ncbi.nlm.nih.gov/gene/?term=395552
- You will find that the gene is in (Gallus gallus) Chicken
- Scroll down and click for the FASTA sequence, which is just above the gene track info.
- https://www.ncbi.nlm.nih.gov/nuccore/NC_006106.5?report=fasta&from=4940024&to=4941241
- Then copy the sequence and go to BLASTx
- https://blast.ncbi.nlm.nih.gov/Blast.cgi#alnHdr_CCC15112
- Click on the alignments tab and click Genpept.
- https://www.ncbi.nlm.nih.gov/protein/CCC15112.1?report=genbank&log$=protalign&blast_rank=1&RID=AY7106AD016
- Scroll to the bottom and you will get the protein sequence:
mkvfslvmvt lllaavwtes sgksfrssys sccyknmfiq keintslirr yretppncsr raiivelkkg kkfcvdpaeg wfqqylqgkk lsntst
N.B. You can also search for homologs of the gene: