ACM SIGMOD 2024 Programming Contest

Task Details

Submitted solutions will be unpacked and reproduced using ReproUnzip on an evaluation server (Azure Standard F32s_v2) with the following characteristics:

Processor	32 CPU x 2.7 GHz
Main Memory	64 GB
Storage	64 GB
Operating System	Ubuntu 20.04.5 LTS

In particular, before running ReproUnzip, the dummy-data.bin dataset (i.e., the original input you worked on) and dummy-queries.bin (i.e., the original queries you worked on) will be replaced with the secret evaluation dataset evaluate-data.bin and secret evaluation queries evaluate-queries.bin.

Here is the detailed sequence of operations used for the evaluation process:

reprounzip docker setup <bundle> <solution>, to unpack the uploaded bundle
reprounzip docker upload <solution> evaluate-data.bin:dummy-data.bin to replace the input dummy dataset (dummy-data.bin) with the evaluation data (evaluate-data.bin)
reprounzip docker upload <solution> evaluate-queries.bin:dummy-queries.bin to replace the input dummy queries (dummy-queries.bin) with the evaluation queries (evaluate-queries.bin)
reprounzip docker run <solution>
reprounzip docker download <solution> output.bin (i.e., you must produce a file named "output.bin" to store your output.)
evaluation of "output.bin"

Important Notes: Your solution will be evaluated on evaluate-data.bin and evaluate-queries.bin, but in order to be evaluated, your submission must meet the following requirements:

The program must be reproduced correctly (i.e., the process must end with the creation of the "output.bin" file without errors).
The program must be finished within 20 min otherwise it incurs "timeout" error (i.e., the total time limit for generating results is 20 min).
Only submissions with an execution time of no more than 1200s will be considered when determining finalists.
The output.bin must contain |Q| x 100 neighbors and all the neighbors should be stored in uint32_t format.

Note: It is prohibitive to use query vectors during the indexing phase. Any submission that uses query information to create the index will result in the team being banned. After the contest, we will also conduct manual checks on the submissions of finalists.

Evaluation Metrics: We will compute the resulting average recall score on evaluation queries. The recall of one query will be computed as follows: $$Recall = { \text{number of true top 100 nearest neighbors} \over \text{100}} $$

Unfortunately, ReproUnzip sometimes prints useful information about the occurred errors on stdout, causing its exclusion from the submission log. In case you are stuck on a technical error preventing the successful reproduction of your submission, with no useful information appearing in the log, and not even the provided ReproUnzip commands can help you to find out the cause of the error, you can send us an email, so that we can check the content of stdout in order to detect the presence of any useful information about the error.