Your initial premise determines the allowable statistical tests and processes that you may take. That's why it is important to state it clearly and to select samples correctly and to perform only appropriate tests and ...
You think that you have two distinct populations -- males and females. If the samples show that it is not very likely that they differ, then you might not have two populations at all -- that is, there is no significant difference between males and females. So, you create two populations from which to take samples -- males and females. Of course, the differences that are characteristic of each group are now separate; so, samples should show that.
The number of samples determines the probability of differences being due to which population the sample belongs. Each population has a frequency distribution for the characteristic being observed and the likelihood of the samples having a frequency distribution that looks like their source population increases as the number of samples increases. For example, if it is equally likely that you will roll 1-6 using a "fair" die, the initial frequency might be uneven, but with more and more rolls, you expect (with a percent certainty) that the frequency distribution will be uniform.
So, if two populations (males and females) indeed have a difference concerning the characteristic being studied (significant multitasking differences), larger and larger samples will more and more likely be different. This is what "confidence interval" is all about.
Now, it critically important that you have carefully defined "multitasking" if you are going to use that term. And, your definition must say specifically that differences in multitasking produce different scores with this game, so let's just look for difference in game scores (don't assume anything in definitions! -- you are only determining how likely males and females get a different score on this game).
So, statisticians look at frequency distributions first. That gives them a sense of the distribution of the data. If there are "outliers" (obvious, unusual data points), there are complicated rules for deleting them. Some are easy: a typo, a missing value, a value that doesn't make sense (e.g., outside usual range), etc. The low score you mentioned is not one of these and should not be eliminated just because you don't like it -- especially if you say that "most subjects get a low score" (then it is obviously not an error). With multiple observations of one variable, you now have a frequency distribution for one person (one observation) and need to either reduce that to one value (e.g., take an average) or continue to consider that all observations have multiple values (like looking at multiple scores for a student in a class in order to determine whether they improved; an average per student doesn't show improvement at all).
If the two frequency distributions look different, that's a good clue that there are differences. Now the numerical statistical tests will determine the probability that this is not just a random difference between the two populations. Here, your Research Teacher has an important role -- evaluating your research based on conducting the appropriate statistical tests. If you are expected to know this already, it makes it tough. If your Research Teacher can outline: (1) test 1, (2) test 2, (3) test 3, (4) conclusion, it makes it much easier. And, the subsequent tests usually depend on the outcomes of the previous tests (e.g., if the two frequency distributions look exactly the same, give up, there's likely no difference).
The description of each statistical test should describe (1) the population, (2) the size of the sample required to assure a required confidence, (3) the fact that the noted difference would not occur any other way (i.e., based on some variable not observed, like age of participants or time-of-day).
Since this is for a Research class (graded) and you lack expertise, you should get the best (not just other students or on-line volunteer tutors) that you can afford in order to make it an excellent learning experience. Let your Research Teacher know that you are quite serious about the research, about learning necessary statistics, and about succeeding in your coursework -- teachers usually like students who do a more than the required minimum (thus, give better grades). This means no longer saying, "I've never ..." and "I don't ..." but, rather, "I'm learning ..." and "I think ..."
Sofia V.
12/31/15