Problem setter's handbook
The concept of the judge is fundamental to what makes Sphere Engine Problems so powerful. In previous chapters, we discussed both test case judges and master judges to show that they are responsible for the user submission correctness verification. This chapter covers the topic of built-in general purpose judges along with representative examples.
Test case judges
We will recall that the test case judge is the program responsible for evaluating a solution to a programming problem in the context of a single test case. Typically, the judge checks the compliance of the output data with the data generated by the user's program.
Test case judge returns a piece of information about the results of the solution execution:
- verdict (see verdicts)
- score (optional)
- execution time
- memory consumption
As we have already discussed, in most cases it is enough to utilize one of the built-in test case judges that we discuss in this section.
Ignores extra white spaces
The output data of the user's program must match the reference output data with the accuracy of the extra white spaces (spaces, tabulations, newline characters, and carriage return characters). The judge's operation is equivalent to the exact comparison of standardized data, where the standardization process involves the removal of the initial and final white spaces and the transformation of the remaining continuous chains of white spaces into a single space. This test case judge doesn't set a score (the score is always equal to 0).
prime numberproblem (i.e. for a number
nis a prime number and then return
1as a result. Otherwise the result is
The problem of prime factorization of the number (i.e., for a number
nfind the prime factors
p1, p2, ..., pkthat
n = p1 • p2 • ... • pk). It is known that the prime factorization of the number is unique up to the order of prime factors. Therefore, if we require an output specification to write a sorted list of factors, there is only one correct answer to the problem.
Ignoring floating point errors up to 0.01
The judge treats white spaces (spaces, tabulations, newline characters, and
carriage return characters) identically to the Ignores extra white spaces
test case judge. In addition, all numbers occurring in the output data of the
user's program must match the corresponding figures in the reference output data
to two decimal places (i.e.,
0.01). This means that if
y are the
numbers being compared, we find them matching if
|x - y| < 0.01. This test
case judge doesn't set a score (the score is always equal to 0).
The problem of the triangle area, i.e. for given integer side lengths
a, b, ccalculate the area of the triangle. In general, the result is not an integer, but a high level of precision is not required (i.e., the precision up to 0.01 is enough).
Ignoring floating point errors up to 0.000001
The judge works analogously to the previous one. However, it compares numbers with greater precision, so it is useful
for problems that require greater accuracy. The required precision is up to
The problem of the value of the
sinefunction. The range of the
[-1,1], therefore precision is important.
Score is source length
Solution correctness is verified in the same way as in the case of the Ignores extra white spaces test case judge. The only difference is the score, which in this case is equal to the length of the user's program source code.
This test case judge works exceptionally well for problems with an optimization side-goal, which is the implementation of the correct solution with as few characters as possible.
The problem of determining first
nnumbers in the decimal expansion of the number
n. The challenge is to solve this problem with the shortest possible source code.
It works exactly the same as the Ignoring extra white spaces judge, but in addition it gives access to the difference between model output file and the user's output.
The output data of the user's program must be identical to the reference output data.
Master judge is a program responsible for summarizing the results from individual test cases. On the basis of the partial results, the following components of the result are determined:
- the final verdict (see verdicts)
- final score
- overall execution time
- overall memory consumption
The master judge can arbitrarily set each part of that information based on the information received from test case judges (see diagram of submission flow).
The final verdict and the score can be freely combined based on verdicts and scores from test case judges.
As we have already discussed, in most cases it is enough to utilize one of the built-in master judges that we are going to examine.
Generic master judge
The master judge accepts the solution if all test cases have been solved. The final parameters are determined as follows:
- final score – the average value from all test cases
- overall execution time - total execution time of all test cases
- overall memory consumption - maximum memory consumption from all test cases
Test case Verdict Score Time Memory 1 accepted 2 1s 1000kB 2 accepted 4 4s 1500kB Final result accepted 3 5s 1500kB
When all test cases obtain the positive verdict (i.e.
accepted), then the final verdict is also accepted. The final score is an average of scores, the overall execution time is a sum and the overall memory consumption is a maximum value over all test cases.
Test case Verdict Score Time Memory 1 accepted 2 1s 1000kB 2 wrong answer 0 4s 1500kB 3 time limit exceeded 0 0s 1000kB Final result wrong answer 0 1s 1000kB
When any test case fails, the final verdict is inherited from the first failed test case. In this example, the problem has three test cases, the second and third of which fail. The final verdict is inherited from the second test case.
The Generic master judge is a proper choice when the problem setter requires that the solution fulfills all their requirements i.e. it is absolutely correct and sufficiently efficient.
Score is % of correctly solved sets
The judge accepts the solution if at least one test case has been correctly solved. The final parameters are determined as follows:
- final score - the percentage of correctly solved test cases
- overall execution time - total execution time of all accepted test cases
- overall memory consumption - maximum memory consumption from all accepted test cases
Test case Verdict Score Time Memory 1 accepted 0 1s 1000kB 2 wrong answer 0 4s 1500kB 3 wrong answer 0 1s 1500kB 4 accepted 0 2s 2000kB 5 accepted 0 1s 1000kB Final result accepted 60 4s 2000kB
When at least one test case obtains the positive verdict (i.e.
accepted), then the final verdict is also positive. The final score is a percentage of correctly solved test cases (in this case
60%because 3 of 5 test cases are solved correctly), the overall execution time is a sum and the overall memory consumption is a maximum value over all test cases.
Let us consider the
power functionproblem. For integer numbers
bcalculate the value of
- The first test case can deliver only input instances for which the result is in the standard numeric type scope (e.g. the result is not greater than
- The second test case requires the solution to manage a large number as the result.
These two test cases give information on solution correctness but without considering efficiency aspects.
- The third test case is designed to test how efficient the solution is. Therefore, it is designed to reject slow (inefficient) algorithms. Only fast (more efficient) algorithms should be able to pass successfully.
The least advanced (but in some way correct) solutions will pass the first test case only and achieve a result of
33%. More advanced solutions (properly handling big numbers) are able to pass the first and the second test and achieve the result of
66%. To achieve the best result of
100%the solution needs to implement both big numbers and fast power algorithm to pass all three test cases.