About VCE Data Explorer

A project to make VCE statistics accessible, transparent, and actually useful for students.

Why this website exists

This project started as a personal tool to help me make decisions about my own studies, and it eventually grew into the platform you see today. The VCE Data Explorer solves a few problems I'd faced while looking for data online.

VCAA reports are difficult to use. The honour roll is the best example of this. With the VCE Data Explorer, all the data you need is sorted nicely and is accessible in one place. Further, there's no other way to access the actual number of high achievers without going into the VCAA reports due to websites such as quppa.net simply don't count students who don't consent to their names being released.
Accurate scaling data is hard to come by. The VTAC scaling report rounds scaled scores to 0dp when they use 2dp internally, and all the ATAR calculators online do the same.
Existing scaling calculators lack transparency. I wanted to know exactly how estimates were reached, and existing calculators just didn't give you this information or any indication of when their information was from.

How is the scaling data calculated?

The Mathematical Model

VTAC doesn't publish a "formula" for scaling or exact scaled scores. Instead, they provide rounded "anchor points" in their annual reports that show how raw scores of 20, 25, 30, 35, 40, 45, and 50 are scaled for each subject.

To determine the scaled scores in between known points, two methods are used:

1. PCHIP Interpolation

For the lower end of the scores (<20), a "Piecewise Cubic Hermite Interpolating Polynomials" (PCHIP) is used.

2. Weighted Regression

For the high end, crowdsourced data is incorporated and a weighted cubic polynomial is fit that respects the official anchors while shifting slightly to match real-world student reports.

Simplified Python Logic

# 1. Official VTAC Anchor Points (Standard 20-50 range)
vcaa_x = [20, 25, 30, 35, 40, 45, 50]
vcaa_y = [21.5, 28.2, 34.4, 40.8, 45.6, 48.9, 50.8]

# 2. Lower End (Raw < 20)
pchip = Scipy.PchipInterpolator(vcaa_x, vcaa_y)

# 3. Higher End (Raw >= 20)
# Official data is combined with crowdsourced points
all_x = vcaa_x + crowdsourced_x
all_y = vcaa_y + crowdsourced_y
# 4. Weighted Regression
# A graduated weighting system is used based on proximity
# Only if verified data is very close does it override the official point
anchor_weights = []
for anchor in vcaa_x:
    dist = min(abs(anchor - tx) for tx in crowdsourced_x)

    if dist == 0: weight = 0       # Exact match (Use crowdsourced data)
    elif dist == 1: weight = 50    # Very close (Heavy reduction)
    elif dist == 2: weight = 200   # Close (Moderate reduction)
    elif dist == 3: weight = 500   # Nearby (Slight reduction)
    else: weight = 1000         # Far away (Normal weight)

    anchor_weights.append(weight)

# 5. Resulting Curve
final_weights = anchor_weights + [1000] * len(crowd_x)
model = np.polyfit(all_x, all_y, deg=3, w=final_weights)

For the ATAR calculator, exact points are used when possible over calculated values. This includes scaled 50.00 when 50 is scaled to 50 and crowdsourced data points.

⚠️ Accuracy Disclaimer

Please note the scaling graphs are statistical approximations. They may be slightly inaccurate for specific scores due to the nature of curve fitting and the lack of official information. Always use the "Calculated" line as an estimate, not a guarantee.

The Distribution Graph

The Study Score Distribution graphs are reconstructed visualizations. Because VCAA only releases the numbers of students scoring above 40, a standard bell curve (Gaussian Distribution) is used with the subject-specific mean and standard deviation to estimate the rest of the cohort.

Note: These graphs are for illustrative purposes only. They help you see how your score relates to the rest of the state, but they shouldn't be used to determine your exact rank or score.