quizivex wrote:It's important to note that the "difficulty of the test" and "percentiles" are completely unrelated! You can't assume that because the "percentile vs. scaled score" plots for recent years line up with 2008, that the raw scores needed for each scaled score in upcoming tests will be similar to the 0877 practice test.
The "difficulty of the test" is measured by what raw score you must get to earn a given scaled score. The scaled scores are intended to measure a given level of performance independent of the difficulty of the particular version of the test, and independent of the performance of the other test takers.
On the other hand, the "percentile vs. scaled score" is determined entirely by the curve set by the pool of recent students taking the test. In the 0877 practice test booklet, they state "The table on page 91 contains, for each scaled score, the percentage of examinees tested between July 2007 and June 2010 who received lower scores.." So the percentiles are averages over a 3 year period, and will not substantially change from one year to the next.
HappyQuark's second graph shows that scaled scores have been increasing over the years. This is because the students have gotten better at the test over the years. The reason the "percentile vs. scaled scores" from 2011 and 2012 line up closest with the 2008 test is that 2008 is by far the most recent of all the years the practice tests were released.
The "difficulty of the test" is a function of one individual test only. There are several versions of the test given on every test date, and students in the same exam room have disagreed over how difficult the test was for that reason. The 0877 test is a sample size of one.
Incidentally, to measure difficulty of tests over the years, the scaled scores should be compared with raw scores. A plot of "raw score vs. percentile" is virtually meaningless because percentiles change over time.
Overall, we cannot use the 2008 test to estimate "which raw score we should be gunning for in the upcoming 2013 exams". It will be dangerous if PGRE.com users go into the test expecting it should be as "easy" as the 08 test and find that it isn't. One of the issues that makes the subject GRE experience difficult is that students won't know in advance how difficult their version of the test is. So if they're having an easy/hard time, they won't know if it's because they're doing well/bad, or if the test itself is easy/hard.... it's possible to mess yourself if you count on needing to get a certain number of questions right to reach a goal.
This is a fair point. I forgot to include the Scaled Score vs Raw Score plot which, as you pointed out, is relevant.
From this plot, it's pretty clear that the relationship between the scaled score and raw score is nice and linear. Where the ambiguity comes in is the cutoff at the 990, as this will affect the slope. on the 0877 test the cutoff cited is a raw score of 82, in the 0177 test it was 85, 9677 is 67, 9277 is 76 and 8677 is 84.
Where things apparently get interesting, as you pointed out, is the comment by ETS where they say:
"The percent scoring below the scaled score is based on the performance of 14,395 examinees who took the Physics Test between July 1, 2007 and June 30, 2010".
If I'm understanding your post, you are saying that your interpretation of this sentence is that there are multiple versions of the test (which I agree with and am aware of) but that each of these tests have significantly different levels of difficulty which, on average, is the cited value they give. I've been interpreting this sentence as the percentile score is found as the average of the test takers over 3 years but that the relative difficulty of the different tests and the corresponding scaled score and raw score is roughly similar.
The thing that doesn't make sense to me about your interpretation, If correct, is this makes the average values they cite completely useless. In fact the table they give at the end of each practice exam is completely meaningless. When I took the practice exams, it was my understanding that when I calculated my raw score on a practice test, I could look at the score conversions table in the back to see what my raw score and percentile rank would have been and, so far as I can tell, this is how everyone else has interpreted it. So, for example, if I get a raw score of 75 on the 0877 exam, it was always my understanding that I could interpret that as a scaled score of 920 and that I was in the 89th percentile (plus or minus some small variation). If you are correct, then in reality getting a raw score of 75 on the 0877 test may have really corresponded to anything between, say, a scaled score/percentile of 750/64% to even as high as a 990/95% (like it was in the 9677 test).
quizivex wrote:There are several versions of the test given on every test date, and students in the same exam room have disagreed over how difficult the test was for that reason. The 0877 test is a sample size of one.
I don't know if forum comments are a good indicator as to the difficulty of the tests. Since, presumably, some people are smarter and/or more well prepared than other people taking the test it would seem reasonable to me that you'd get varied descriptions as to how difficult a test is, even if the tests were identical. Regardless, this is unfortunately an unknown that is relevant. Based on the graphs and their wording on the data, it seems more reasonable to me that the GRE problems are different between tests but of roughly similar difficulty with the 990 cutoff corresponding to a raw score somewhere around 75-85. If these assumptions are correct then I think this data can be of some use. If not then we may just be *** out of luck.