Hello there, I'm Brian. I'll try my best to guide you on this question without outright answering it for you so that you can learn a bit of the thought process and how to come up with the solution on your own.
We are given some important pieces of information when the question writes:
- Genes always begin with the start codon: ATG
- Genes end with one of the following 3 stop codons: TAG, TAA or TGA.
- The substrings ATG, TAG, TAA, and TGA will only occur at the start and end of genes.
We essentially have a substring problem. We know how to start and end our substrings. We look for an ATG codon sequence to start our sequence. We are also looking for TAG, TAA, or TGA codons to end our sequence.
Let's take a look at the example
get_genes("TCATGTGCCCAATTCTGACCTACGATGGCCCAATAGCG") would return the list:
["TGCCCAATTC", "GCCCAA"].
In the example, the returned items are the sequences (without including the start and stop codons).
This means all we have to do is a loop through our list, looking for a start sequence ATG. Upon finding ATG, we should use a temporary string to track the characters while also looking for an end sequence TAG, TAA, or TGA. If we find the end sequence, then we can add our temporary string to a result list. We continue this process until we reach the end of the DNA sequence.
I hope that answers your question! Good luck on your assignment.