Yes, IQ is measured on an interval scale, but some tests also break specific skills into categories that are then measured using interval data. IQ is numeric data expressed in intervals using a fixed measurement scale. If IQ were ordinal, their designations could not be precise enough to be meaningful and we could only know if someone is more or less smart than someone else - there would be no numerical measure. For example, imagine an ordinal IQ scale, categories of progressively higher intelligence quotients. Everyone in the same category would be considered as having the same IQ, though each subject might have different strengths and weaknesses - and this ordinal scale fails to measure those differences. A ratio IQ scale might be unreliable for several reasons, although one might reasonably say that all IQ scales are generally unreliable.
True experimental designs tend to be strong, but these are not always practical or ethical when dealing with human subjects. Education programs typically use classes that are already established and cannot be broken up into random test groups and control groups, so we try to compromise by "randomizing" students' assignments to the test and control groups somehow within their pre-existing classes. Such designs are stronger with larger populations of subjects. If you have too few subjects, it ruins the "random" effect and makes your results unreliable. I would suggest that including both pre- and post-tests would be the only way to hope for valid results - well, it may depend upon your particular question.
A lot hinges on the number of subjects available. With a larger population, you might have a "new" program group, an "old" program group, and some kind of control group (no special treatment at all?). For most purposes, it may be enough to compare results from the "new" program to those of a control group. After all, with new educational programs, we usually want to know if the proposed program works better than not using it at all.
This is where you can run into ethical issues. When experimenting with humans - especially children - we must somehow justify the risk that a group of students might be hindered or damaged academically. However the new program performs, we could claim that one of the groups was denied some benefit. How about a two or three phase test in which each group gets all treatments at different phases of the project. This way, you have more data, random assignment to test groups might become irrelevant, and every subject gets the same opportunity to benefit from the same treatments. If some of these ideas help, I am grateful. Good luck!