Problem setter's handbook
Judges


The concept of the judge is fundamental to what makes Sphere Engine Problems so powerful. In previous chapters, we discussed both test case judges and master judges to show that they are responsible for the user submission correctness verification. This chapter covers the topic of built-in general purpose judges along with representative examples.

Test case judges

We will recall that the test case judge is the program responsible for evaluating a solution to a programming problem in the context of a single test case. Typically, the judge checks the compliance of the output data with the data generated by the user's program.

Test case judge returns a piece of information about the results of the solution execution:

  • verdict (see verdicts)
  • score (optional)
  • execution time
  • memory consumption

As we have already discussed, in most cases it is enough to utilize one of the built-in test case judges that we discuss in this section.

Ignores extra white spaces

The output data of the user's program must match the reference output data with the accuracy of the extra white spaces (spaces, tabulations, newline characters, and carriage return characters). The judge's operation is equivalent to the exact comparison of standardized data, where the standardization process involves the removal of the initial and final white spaces and the transformation of the remaining continuous chains of white spaces into a single space. This test case judge doesn't set a score (the score is always equal to 0).

Example 1

The prime number problem (i.e. for a number n determine whether n is a prime number and then return 1 as a result. Otherwise the result is 0).

Example 2

The problem of prime factorization of the number (i.e., for a number n find the prime factors p1, p2, ..., pk that n = p1 • p2 • ... • pk). It is known that the prime factorization of the number is unique up to the order of prime factors. Therefore, if we require an output specification to write a sorted list of factors, there is only one correct answer to the problem.

Ignoring floating point errors up to 0.01

The judge treats white spaces (spaces, tabulations, newline characters, and carriage return characters) identically to the Ignores extra white spaces test case judge. In addition, all numbers occurring in the output data of the user's program must match the corresponding figures in the reference output data to two decimal places (i.e., 0.01). This means that if x and y are the numbers being compared, we find them matching if |x - y| < 0.01. This test case judge doesn't set a score (the score is always equal to 0).

Example

The problem of the triangle area, i.e. for given integer side lengths a, b, c calculate the area of the triangle. In general, the result is not an integer, but a high level of precision is not required (i.e., the precision up to 0.01 is enough).

Ignoring floating point errors up to 0.000001

The judge works analogously to the previous one. However, it compares numbers with greater precision, so it is useful for problems that require greater accuracy. The required precision is up to 0.000001.

Example

The problem of the value of the sine function. The range of the sine function is [-1,1], therefore precision is important.

Score is source length

Solution correctness is verified in the same way as in the case of the Ignores extra white spaces test case judge. The only difference is the score, which in this case is equal to the length of the user's program source code.

This test case judge works exceptionally well for problems with an optimization side-goal, which is the implementation of the correct solution with as few characters as possible.

Example

The problem of determining first n numbers in the decimal expansion of the number π for given n. The challenge is to solve this problem with the shortest possible source code.

Academy

It works exactly the same as the Ignoring extra white spaces judge, but in addition it gives access to the difference between model output file and the user's output.

Exact judge

The output data of the user's program must be identical to the reference output data.

Master judges

Master judge is a program responsible for summarizing the results from individual test cases. On the basis of the partial results, the following components of the result are determined:

  • the final verdict (see verdicts)
  • final score
  • overall execution time
  • overall memory consumption

The master judge can arbitrarily set each part of that information based on the information received from test case judges (see diagram of submission flow).

Important

The final verdict and the score can be freely combined based on verdicts and scores from test case judges.

As we have already discussed, in most cases it is enough to utilize one of the built-in master judges that we are going to examine.

Generic master judge

The master judge accepts the solution if all test cases have been solved. The final parameters are determined as follows:

  • final score – the average value from all test cases
  • overall execution time - total execution time of all test cases
  • overall memory consumption - maximum memory consumption from all test cases
Example 1 (correct solution)
Test case Verdict Score Time Memory
1 accepted 2 1s 1000kB
2 accepted 4 4s 1500kB
Final result accepted 3 5s 1500kB

When all test cases obtain the positive verdict (i.e. accepted), then the final verdict is also accepted. The final score is an average of scores, the overall execution time is a sum and the overall memory consumption is a maximum value over all test cases.

Example 2 (incorrect solution)
Test case Verdict Score Time Memory
1 accepted 2 1s 1000kB
2 wrong answer 0 4s 1500kB
3 time limit exceeded 0 0s 1000kB
Final result wrong answer 0 1s 1000kB

When any test case fails, the final verdict is inherited from the first failed test case. In this example, the problem has three test cases, the second and third of which fail. The final verdict is inherited from the second test case.

Tip

The Generic master judge is a proper choice when the problem setter requires that the solution fulfills all their requirements i.e. it is absolutely correct and sufficiently efficient.

Score is % of correctly solved sets

The judge accepts the solution if at least one test case has been correctly solved. The final parameters are determined as follows:

  • final score - the percentage of correctly solved test cases
  • overall execution time - total execution time of all accepted test cases
  • overall memory consumption - maximum memory consumption from all accepted test cases
Example 1
Test case Verdict Score Time Memory
1 accepted 0 1s 1000kB
2 wrong answer 0 4s 1500kB
3 wrong answer 0 1s 1500kB
4 accepted 0 2s 2000kB
5 accepted 0 1s 1000kB
Final result accepted 60 4s 2000kB

When at least one test case obtains the positive verdict (i.e. accepted), then the final verdict is also positive. The final score is a percentage of correctly solved test cases (in this case 60% because 3 of 5 test cases are solved correctly), the overall execution time is a sum and the overall memory consumption is a maximum value over all test cases.

Example 2

Let us consider the power function problem. For integer numbers a and b calculate the value of ab.

  • The first test case can deliver only input instances for which the result is in the standard numeric type scope (e.g. the result is not greater than 232).
  • The second test case requires the solution to manage a large number as the result.

These two test cases give information on solution correctness but without considering efficiency aspects.

  • The third test case is designed to test how efficient the solution is. Therefore, it is designed to reject slow (inefficient) algorithms. Only fast (more efficient) algorithms should be able to pass successfully.

The least advanced (but in some way correct) solutions will pass the first test case only and achieve a result of 33%. More advanced solutions (properly handling big numbers) are able to pass the first and the second test and achieve the result of 66%. To achieve the best result of 100% the solution needs to implement both big numbers and fast power algorithm to pass all three test cases.

Next chapter