In fact, cluster sampling, as you have defined it, is a form of random sampling, and this can be shown as follows:
Suppose there are a total of R clusters, labeled C1,C2,...,CR, and suppose that we intend to select exactly r of them randomly, then add each member of the selected clusters to our sample.
Take any individual member of the population, and let S denote the event that this member is ultimately selected for our sample. Of particular interest to us is P(S).
For each i = 1,2,...,R, define Ai to be the event that the member is in cluster Ci, and define Bi to be the event that Ci is one of the r clusters that is selected.
We have
P(S) = P(S and A1) + P(S and A2) + ... + P(S and AR)
= P(A1)•P(S|A1) + P(A2)•P(S|A2) + ... + P(AR)•P(S|AR)
= P(A1)•P(B1) + P(A2)•P(B2) + ... + P(AR)•P(BR).
Now, each P(Bi) is equal to r/R, because the r clusters are selected randomly from the pool of R clusters. Factoring out r/R from each of the terms above gives
P(S) = r/R • [ P(A1) + P(A2) + ... + P(AR) ].
The sum of the P(Ai) is 1 because the Ci partition the population, so that exactly one of the Ai must occur. This leaves just
P(S) = r/R.
Notice that this probability is constant. It does not depend on which member of the population was being considered. In other words, every member of the population has the same probability r/R of being selected for the sample. By definition, a sampling method which gives each member of the population the same probability of being selected for the sample is a random sample.