Problem setter’s handbook - Appendix


Good test cases design

It is common misunderstanding which leads to bad test cases design. The number of test cases assigned to the problem is limited. The individual test case is not intended to test only one problem instance.

Tip

We recommend you to redesign your input / output specification to handle with multiple problem instances in one test case.

Consider the following elementary problem as an example:

Task example:
For given integer numbers a and b calculate the sum a + b.

We could design the input / output specification to calculate the sum only for two numbers:

Poor input / output design example:

Input
In the only line of the input there will be two integer numbers a and b separated by a single space character.

Output
Program should write a single number which is the value of a + b.

It is correct but it is highly not recommended. First of all using even all possible slots for test cases cover a small part of the possible problem instances. Secondly, the execution of each test case is time consuming (about 2s additional time for each test case).

We recommend to redesign the input / output specification in following manner:

Proper input / output design example:

Input
In the first line there will be the number t which is the number of instances for the problem. In each of the next t lines there will be the pair of two numbers a and b for which you should calculate the value of a + b.

Output
For each a,b pair print the calculated sum. Separate answers with new line character.

As you can see it is possible to pack a large number of problem instance into single test case.

Note

Multiple test cases should rather be used to test different aspect of the problem.

Interactive problems

The standard schema of the submission processing assumes that at first the execution of the user's program generates the user's output and after that the test case judge in a certain way compares it with the model output. In the interactive problem these phases are executed simultaneously.

The execution of the user's program starts and its standard output is directed to the test case judge as its standard input. In the same time the test case judge program starts and its standard output is directed to the user's program as its standard input. This allows a direct communication between the user's program and the test case judge as you can see in the picture below.

Important

Interactive problem requires dedicated test case judge.

When the conversation is open, the test case judge can interactively generate input for the user's program depending on its output and in the same time it is possible to verify the correctness of the results.

Interactive problem example:

Write a program that will win the chess game with the opponent who make random legal moves.

Input
The first line of the input contains the number s which can be 0 or 1. If s = 0, then your opponent plays white pieces and if s = 1 then you plays white pieces. In the next lines there will be moves of your opponent write in the standard chess algebraic notation (for example e4 e5 means that the piece on e4 moves to e5). You can assume that all opponent moves are legal. When the match is over the final line of the input will be "win" or "lose" and the program will be accepted only if the user's solution win the match.

Output
If s = 1 you start with the first move (i.e. you write the line with your move in the standard chess algebraic notation) and after that you write your next move after you get your opponent's move and so on and so forth.
If s = 0 you wait for your opponent's move and then again you play move by move.

By the description of the interactive problem example you can see that we need to implement the test case judge which is able to play random chess game:

  • it remembers the state of the chess board,
  • it can make legal moves,
  • it can verify that the game is over and who is the winner,
  • it accepts only solutions which won the game.

Attention

Due to common issues with the standard output writes we highly recommend you to clear the output buffer after printing each line. It can be done by using the fflush(stdout) command in the C/C++ languages.

Time complexity

Test cases along with time limits give a possibility of verification time complexity of algorithms. Consider the most basic case when the author knows two different algorithms for a problem, say A and B and let us assume that the algorithm A is noticeably faster than the algorithm B.

Note

It is not always easy and obvious how to prepare test cases to distinguish between two algorithms.

Assuming that we have input data which is processed in the time tA for the algorithm A which is much faster than execution time tB for the algorithm B we can simply set the time limit somewhere between those values.

Important

The presented approach highly depends on the machine thus you need to adjust your time limit to the computing cluster rather then your local machine.

With the timeout tA ≤ t0 ≤ tB we can assume that A-like algorithms will pass the test case and B-like algorithms will fail it due to exceeding the time limit.

Caution

Presented method is not a real time complexity testing, slower algorithm can beat the faster one when it is well technically optimized for the test cases and the machine.

It is also not a universal method - changing the machine can allow slower algorithms to pass test cases designed for faster algorithms only.

The sorting problem is one of the most demonstrative example when there are many different solutions. All natural solutions need approximately n2 operations to sort the sequence of length n. However, the more sophisticated algorithms guarantee approximately n log(n) operations which is significantly better result.

In the problem example section you can see properly prepared test cases which distinguish solutions for The initial sum problem.

Note

Obviously for problems with many (not only 2) solutions of different speeds you can construct a hierarchy of test cases to reflect the gradation of solutions in scores.

Memory complexity

Similarly to time complexity testing one can test memory complexity of algorithms. Consider the simplest situation when the author knows two different algorithms for a problem, say A and B. Let us assume that algorithm A consumes small and constant amount of memory and algorithm B memory needs are dependent on the problem input data (possibly big amounts).

You can distinguish between solutions A and B by constructing adjusted test cases. If we denote that designed test case makes algorithm A to use mA megabytes of memory and algorithm B to use mB megabytes of memory and these values are separated you can set the memory limit m0 megabytes somewhere between mA and mB.

Important

We do not directly support memory limit option due to complications with solutions written in virtual machine interpreted languages (for example Java languages family).

Due to the note above you need to approach individually to limit the memory that program can use. As we said there is no single parameter which sets memory limit. To obtain desired functionality you can construct custom master judge and limit the memory inside separately for each programming language you allow to use for solutions.

Example:
The prime number problem can be solved in constant memory by looking for divisors or alternatively with Sieve of Eratosthenes algorithm which consumes the amount of memory which depends on the input number.

Statuses

There are two levels when the status is assigned to the submission:

  • test case - the status is produced by the test case judge,
  • master judge - the status is a combination of statuses from test cases.

The master judge is a high order level component and it can arbitrary assign any status to the submission. We are going to focus on the test case judge statuses.

We separate statuses into two groups: semantic and systemic. The semantic statuses are strictly related to the correctness of the answer to the problem. On the other hand, the systemic statuses are syntactic related and the judge gets it from the system.

Semantic statuses

  • Accepted (AC) the submission is a correct solution to the problem.
  • Wrong answer (WA) the submission is an incorrect solution.

System statuses

  • Time limit exceeded (TLE) the submission execution took too long.
  • Runtime error (RE) the error occurred during program execution.
    • NZEC (Non-Zero Exit Code) main function returned error signal (for example main function in C/C++ should return 0).
    • SIGSEGV the program accessed unallocated memory (segmentation fault).
    • SIGABRT the program received abort signal, usually programmer controls it (for example when C/C++ assert function yields false).
    • SIGFPE the floating point error, usually occurs when dividing by 0.
  • Compilation error (CE) the error occurred during compilation or syntax validation in interpreter.
  • Internal error (IE) the error occurred on the service side. One of the possible reasons can be poorly designed test case judge or master judge.

Note

The Internal error covers wide area of errors (including server errors) thus in the near future we will introduce another type of error for judge and master judge errors.

To illustrate errors consider again the following example:

Example

For a positive integer n calculate the value of the sum of all positive integers that are not greater than n i.e. 1 + 2 + 3 + ... + n. For example when n = 5 then the correct answer is 15.

Input
In the first line there will be the number 1 ≤ t ≤ 10000000 which is the number of instances for your problem. In each of the next t lines there will be one number n for which you should calculate the described initial sum.

Output
For each n print the calculated initial sum. Separate answers with new line character.

The first error which can occur is the compilation error, for example submitting the following source code would produce the compilation error status:

long long initsum(long long n)
{
    return n*(n+1)/2;
}

int main()
{
    int t // missing semicolon
    long long n;
    scanf("%d", &t);
    while (t > 0)
    {
        scanf("%lld", &n);
        printf("%lld\n", initsum(n));
        t--;
    }
    return 0;
}

To obtain runtime error we can refer to unallocated memory:

long long initsum(long long n)
{
    return n*(n+1)/2;
}

int main()
{
    int t;
    long long n;
    scanf("%d", &t);
    while (t > 0)
    {
        scanf("%lld", n); // referring to unallocated memory 
        printf("%lld\n", initsum(n));
        t--;
    }
    return 0;
}

We will exceed time limit with worse algorithm (if test cases are rich enough):

// suboptimal algorithm
long long initsum(long long n)
{
    int i;
    long long sum = 0;
    for (i=1; i <= n; i++)
    {
        sum += i;
    }
    return sum;
}

int main()
{
    int t;
    long long n;
    scanf("%d", &t);
    while (t > 0)
    {
        scanf("%lld", &n);
        printf("%lld\n", initsum(n));
        t--;
    }
    return 0;
}

Bad output formatting causes wrong answer status:

long long initsum(long long n)
{
    return n*(n+1)/2;
}

int main()
{
    int t;
    long long n;
    scanf("%d", &t);
    while (t > 0)
    {
        scanf("%lld", &n);
        printf("%lld", initsum(n)); // missing new line character
        t--;
    }
    return 0;
}

At the end we present correct and optimal solution which passes all test cases and obtains accepted status:

long long initsum(long long n)
{
    return n*(n+1)/2;
}

int main()
{
    int t;
    long long n;
    scanf("%d", &t);
    while (t > 0)
    {
        scanf("%lld", &n);
        printf("%lld\n", initsum(n));
        t--;
    }
    return 0;
}