Problem setter's handbook
Custom judges


As we discussed in the previous chapter, the system verifies the correctness of the submissions using components called judges. There are two types of judge programs:

  • test case judge - responsible for assessing the submission in the context of a single test case
  • master judge - responsible for summarizing the results from individual test cases

Problem setters can choose a judge among multiple built-in judge programs that are supported on Sphere Engine. In many cases the supported judges can be used instead of writing a new one. If the programming problem requires a non-standard method of evaluation, then the problem setter can implement their own judge. This is possible in each programming language offered by the Sphere Engine system.

Implementation of a test case judge

The task of a test case judge is to evaluate the submission in the context of a single test case.

The assessment of the submission consists of determining whether it is correct (by establishing the so-called verdict) and - optionally - in giving a score. The judge program makes a decision based on the test case data and the result of executing the user's program.

Dependencies are depicted in the diagram below:

Test case judge interface

The test case judge has access to several data streams. Each stream is accessible through a so-called file descriptor which is an integer.

Streams for reading (input streams)
Name File descriptor
Test case input data 0
Test case output data 4
User's program output data 3
User's program source code 5
Streams for writing (output streams)
Name File descriptor
Score 1
Auxiliary/debug stream 6
User's program input data
(available only for interactive problems)
8

Information about the programming language identifier in which the user's program is implemented is available in the environment variable of the namecompiler (necessarily lowercase). The value is consistent with the list of available programming languages.

Validation of the submission

Submission correctness in the context of a test case is determined by the so-called verdict. Verdict information is determined by using the end signal of the program. The value of the end signal is interpreted as follows:

  • 0 – the submission correctly solves the test case (accepted)
  • 1 – the submission is incorrect (wrong answer)

For example, in C, the end signal is the value returned by the main function of the program (i.e. the function named main).

Note

The test case judge is only able to specify two of many different verdicts (i.e. accepted or wrong answer). Other verdicts (e.g., compilation error, runtime error, time limit exceeded) are set automatically by the Sphere Engine system.

Score

In the case of optimization problems or problems assessed partially, an important factor in addition to solution correctness is its quality. In the case of problems of this type, the author of the judge program has the ability to determine the score. The score is presented in the form of a number (potentially a floating point number).

The score should be written to the appropriate stream (descriptor number 1). No other information should be written to the stream with the score.

In many cases, there is no need to determine the score. Very often verification of the submission is limited to determining whether it is correct or incorrect.

Examples

The sample source codes of test case judges are available on GitHub:

Example 1 - Impossible model output file
Problem description

For given function f find its root i.e. the argument x0 that f(x0) = 0.

In general, there are many solutions to the problem, for example for polynomial x2 + x - 2 the numbers 1 and -2 are both correct answers.

You can see that it is difficult to prepare model output file in the test case. There is a possible infinite number of solutions for certain functions. Therefore, it is impossible to keep all of them in the output file. It forces us to use a different approach.

Test case judge description

The test case judge should verify the condition from the problem task i.e. for the user's answer from the output file (descriptor number 3) it should check if that answer is a root of the given function. Test case judge uses its access to the input file (descriptor number 0) to read the function defined in the input data.

Example 2 - Ambiguous model output file
Problem description

For a given graph G with n vertices 1, 2, ..., n determine if it has a Hamiltonian cycle (i.e. closed loop through a graph that visits each node exactly once). If the Hamiltonian cycle exists print it as a sequence of vertices.

It is easy to see that 1-2-3-1 is the same cycle as 2-3-1-2. We could add the requirement to start with the smallest vertex number. Unfortunately, it is possible that there exist many different Hamiltonian cycles which are not cyclic shifts. We could again use the previous approach (from Example 1) to verify if the user's answer is a really Hamiltonian cycle.

Alternatively, we can build a model output file with all possible Hamiltonian cycles.

Test case judge description

For the user's output file (descriptor number 3), the test case judge verifies if the sequence of vertices is present in the list that is part of the model output file.

Important

It can be problematic to keep all answers due to a possibly huge number of good solutions.

Implementation of the master judge

The master judge is responsible for summarizing individual test case results. Based on partial results it determines the final verdict consisting of the following parameters:

  • the final verdict of the submission (e.g. accepted or wrong answer)
  • final score
  • overall execution time
  • overall memory consumption

The final verdict is determined based on the data from the test cases and (if requirements or restrictions are introduced) also based on the user's program source code.

Dependencies are depicted in the diagram below:

Master judge interface

The master judge (like the test case judge) has access to several data streams. Each stream is accessible through a so-called file descriptor, which is an integer.

Streams for reading (input streams)
Name File descriptor
Results of test cases 0
User's program source code 5
Streams for writing (output streams)
Name File descriptor
Final results 1
Auxiliary/debug stream 6

Information about the programming language identifier in which the user's program is implemented is available in the environment variable of the namecompiler (necessarily lowercase). The value is consistent with the list of available programming languages.

The format of results from individual test cases

The partial results obtained during the processing of consecutive test cases are recorded in the appropriate stream (descriptor number 0). The data of each test case is written in a separate line of the stream and takes the following form:

N SCORE SIGNAL TIME MEMORY STATUS

For example:

0 AC 10 0 0.57 8096

The individual parameters have the following meaning and format:

Name Meaning Format
N test case number integer
the numbering starts with the number 0
STATUS verdict the following abbreviations are used:
  • AC​ - accepted
  • WA​ - wrong answer
  • TLE​ - time limit exceeded
  • RE​ - run time error
  • CE​ - compilation error
  • IE​ - internal error
SCORE score a number in the X.xx format (e.g. 2.00, 1345.00, 12)
SIGNAL the end signal returned by the user's program an integer
TIME CPU time a number of seconds in the X.xx format (e.g. 2.00, 1345.00, 12)
MEMORY maximum memory consumption a number of kilobytes (integer)

Obtaining a detailed report from processing a submission requires that the partial results be redirected to the auxiliary stream (descriptor number 6). The data must retain the following format:

test N - STATUS (score=SCORE, sig=SIGNAL, time=TIME, mem=MEMORY)

For example:

test 0 - AC (score=10, sig=0, time=0.57, mem=8096)

The final result and its format

The author of the master judge is free to determine the final result parameters. For example, the final submission execution time can be determined as the sum of the execution times of individual test cases. Alternatively, the final execution time can be determined as an average value

The final result should be written to the appropriate stream (descriptor number 1). The data format is as follows:

STATUS SCORE SIGNAL TIME MEMORY

For example:

AC 10 0 0.57 8096

Individual parameters have meaning and a format similar to the parameters from the previous table. Nevertheless, all values ​​are calculated by the master judge in the manner specified by its author:

Name Meaning Format
STATUS final verdict the following abbreviations are used:
  • AC​ - accepted
  • WA​ - wrong answer
  • TLE​ - time limit exceeded
  • RE​ - run time error
  • CE​ - compilation error
  • IE​ - internal error
SCORE final score a number in the X.xx format (e.g. 2.00, 1345.00, 12)
SIGNAL the end signal an integer
TIME execution time a number of seconds in the X.xx format (e.g. 2.00, 1345.00, 12)
MEMORY memory consumption a number of kilobytes (integer)

Examples

Weighted % of correctly solved sets

Here we present the generalization of Score is % of correctly solved sets master judge. It is a small disadvantage that each test case is worth the same, and to increase the influence of some submission's aspect we are forced to produce many test cases.

For example, when our test cases verify three aspects A, B, and C of the problem and we would like to apply weights of 20%, 30%, and 50% respectively, we are able to achieve this by creating 10 test cases. Two of these are responsible for an aspect A; three are responsible for an aspect B; and five are responsible for an aspect C. However it is inconvenient and we can consider the following idea:

Master judge description

Master judge has information about the number of test cases and weights, which it should assign to each test case. The final score is the weighted sum of accepted test cases.

For instance, for three test cases a, b, c and weights 20%, 30%, 50% the submission gets one of the possible results depending on passed test cases:

  • no test case passed - 0%
  • a - 20%
  • b - 30%
  • a, b - 50%
  • c - 50%
  • a, c - 70%
  • b, c - 80%
  • a, b, c - 100%
Forbidden structures in the source code

The problem setter may require that the solution cannot use some programming structures. For example, they may want to allow usage of language C++ but with no access to the STL library, in order to force users to implement efficient data structures manually. Another example is to restrict source code to not use loop structures, to support only solutions based on recursion.

Master judge description

Master judge uses access the user's source code to detect usages of forbidden keywords (for example loop types: while, for, goto). When a forbidden keyword is detected the final verdict is set to the wrong answer. In other cases, the master judge performs classical verification (for example the same as Generic master judge or Score is % of correctly solved sets).

Memory consumption limitations

We cannot directly support memory limit due to the reasons explained in memory complexity section of the Appendix. To make bounding the amount of available memory possible we can implement a master judge specifically for this purpose.

Master judge description

Master judge gathers the information of consumed memory from test case judges. We can verify the memory limit with respect to the user's solution programming language to adjust the master judge for all programming languages we allow to use.

Important

It's very important to adjust the memory limit to the programming language due to different memory needs of programs in different languages.

Previous chapter
Next chapter