Israel S.

asked • 04/23/22

Longest Repeat Problem: Find the longest repeat in a string.

Although the suffix tree decreases memory requirements from O(|Text|2) to O(|Text|), on average it still requires about 20 times as much memory as Text (when the length of Text is on the order of the human genome). For a 3 GB human genome, 60 GB of RAM is a huge improvement over the 1 TB that we needed to work with a trie constructed for all reads, but it still presents a memory challenge for most machines. This reveals a dark secret of big-O notation, which is that it ignores constant factors. For long strings such as the human genome, we will need to pay attention to this constant factor, since the expression O(|Text|) applies to both an algorithm with 2 · |Text| memory and an algorithm with 1000 · |Text| memory.

Yet before seeing how we can further reduce the memory needed for multiple pattern matching, we ask you to solve three problems showing how suffix trees can be applied to other pattern matching challenges. The first such problem is the Longest Repeat Problem.

Longest Repeat Problem: Find the longest repeat in a string.

  1. Input: A string Text.
  2. Output: A longest substring of Text that appears in Text more than once.

Code Challenge: Solve the Longest Repeat Problem. (Multiple solutions may exist, in which case you may return any one.)

Sample Input:

ATATCGTTTTATCGTT

Sample Output:

TATCGTT

Heres another dataset to try

TAATGTGCTCATTGAGTTACTTCCACCGCTATGCATGTTCTCACCTGATTCTGGTTGTTCAGTGGCCGGGAATCAGCGATGAGTGACTCTATTCAATGCGGCTAACCTGCCGGTGAAAAAACCGTATCACGAGCCGACTCAGCACGCCCCTTCCAGCCAGTGCTTGCCACGTTTGTATAGTACGCGCACCAGAATCAAATGGACGCTCGTATCGACAACGCCCTCGACCTCTGCGGAACCTGAGGGATACAATGCGACTCTAATGTACACTTCGTCTGGGGGTCATGCATCGTGGCCAAGATCTAGAGTCATCAGTCCCGTAATTTGCCCTACGTGTGAACATCGGTGGTAAGGATATGATGCCTCATCCAAAGCAAGAGCTGAACCAGCACTAAGCGTAGTTGCGCGGGTGTTGAACCTCTCAGCTCATTGGACGCGCAGCAAAGAATGCGTTATCTGTCCCAATCAATAGCAAATTGCGCGATGGGTGGCCCCGACAATTCTGGTTGTTCAGTGGCCGGGAATCAGCGATGAGTGACTCTATTCAATGCGGGAGTTCGTCCGTACTGACTCCCAATTCCATCAATTCTGGTTGTTCAGTGGCCGGGAATCAGCGATGAGTGACTCTATTCAATGCATGGTCGACGACCACCTGGCTGTGTTAAACATCCGATTGTGCGCTTTTTGATTTTCAACATACGCCGGACGGCTAATAAAAACGATGCACGTGCATAATGAGGATATGGGTGCTGTTTTTATGTGCCCGCATTGGCTTGGACCGATCAATATTATGTTGCGATAGGTACCCTGACTCTAAAAGGTCCCATTGGTTTCATTACCGCACGTCGCCACCGTACGTGTCTGAAACCGGTTCGAAGATGTCTGGAGCCAAGTCCATAGGCGTCGTGTATGAGTGGTCATACCGAGCTGCCCATTTGATCCTCTCAAAGGCCGAGAACAGGCACGAGGTGACGCCCTTTTACCTCGTAGACCCAAACACCCAAGTGTCCCCAGGAGCCCTCGCGTACTTAAGCACATCTCCCGCCGCCATGGTACTCTGCGAGTTGGATCATACATACATCACTAGGTAGATGGAAGCATGAGCCAAACCTTACCCCAAATACGCCATAGTAGTGAACGCACTCATGAGTAGAGT



1 Expert Answer

By:

Liam M. answered • 07/24/23

Tutor
0 (0)

Software Engineer Specializing in Python

Still looking for help? Get the right answer, fast.

Ask a question for free

Get a free answer to a quick problem.
Most questions answered within 4 hours.

OR

Find an Online Tutor Now

Choose an expert and meet online. No packages or subscriptions, pay only for the time you need.