How do you merge SNP data with a reference genome?

Question

# My DataI have a 23andMe file listing SNPs in the form:`rsid       chromosome      position        genotypersXXXXX         1           PPPPPP          CTrsXXXXX         1           PPPPPP          GG`Fields are TAB-separated and each line corresponds to a single SNP. For each SNP, four fields of data are supplied.  1. An identifier (an rsid or an internal id) 2. Its location on the reference genome.    - The chromosome it is located on.    - The position within the chromosome is is located on. 3. The genotype call oriented with respect to the plus strand on the human reference sequence.The reference genome is the human assembly build 37 (also known as Annotation Release 104).# My QuestionHow do I merge the SNPs into the reference genome?For example, take the first line in my SNP file:`rsXXXXX        1           PPPPPP          CT`### Part 1I can see that I need to replace the nucleotide at position PPPPPP on chomosome 1 of the reference genome with a nucleotide from the genotype field, but which nucleotide am I supposed to use? C or T? And why?### Part 2Where am I supposed to start counting from on the reference genome? Looking at chromosome 1 of the human assembly build 37, the first ~10,000 characters (excluding the first line description) are `N`. Is the first N number 1? eg. If PPPPPP was 100,000 would I replace the 100,000th character in the reference genome with the correct nucleotide from **Part 1** of this question? Or should I start counting from the first non-N character in the fasta file?

Aref E. · Accepted Answer

I am not sure if I follow your question. If you need to merge a sequence SNP (or otherwise) with a reference genome you can use many linux and text-editing commands. Is that your question? You can also use Perl and regex commands. Sorry I cannot be more of a help. N means that the sequence is not assigned to ATCG. In experiments that happens when the sequencer can NOT assign the nucleotide to be one of ATCG. Adding the SNP sequence between N's will not help you in any way. Most aligners escape long strings of N and that's the point for putting it in the beginning of the reference genome for technical reasons.

How do you merge SNP data with a reference genome?

1 Expert Answer

Still looking for help? Get the right answer, fast.

OR

RELATED TOPICS

RELATED QUESTIONS

What animals are self aware?

I need to know the stages of meiosis and why it's important to know them.

How does cloning work and why do people get mad about it?

how are genes inherited from a mom or dad to a child?

DNA to mRNA for three diferent strands of DNA, given "5-AGT-3"

RECOMMENDED TUTORS

IXL

Rosetta Stone

Education.com

TPT

Vocabulary.com

ABCya

SpanishDictionary.com

Inglés.com

Emmersion

How do you merge SNP data with a reference genome?

1 Expert Answer

Still looking for help? Get the right answer, fast.

OR

RELATED TOPICS

RELATED QUESTIONS

What animals are self aware?

I need to know the stages of meiosis and why it's important to know them.

How does cloning work and why do people get mad about it?

how are genes inherited from a mom or dad to a child?

DNA to mRNA for three diferent strands of DNA, given "5-AGT-3"

RECOMMENDED TUTORS

find an online tutor