SPAMD - Spam Detection

no tags 

It is well-known that the number of occurrences of the term "free" can distinguish spam and non-spam emails.

Your task is to build a spam detection module, based on the number of term "free" in an email. The core of this detection module is a spam classifier, which is represented by two variables: Low and High.

An email that contains X "free" words is classified by this module as a spam if Low ≤ X ≤ High, otherwise it is not. To measure the goodness of a classifier, we introduce several information-retrieval terminologies:

  • Actual
  • Spam Non-Spam
  • Predicted Spam TP FP
  • Non-Spam FN TN

TP (true positive) is the number of emails which are truly predicted as spam; FN (false negative) is the number of emails which are wrongly predicted as non-spam, and so on.

The portion of emails that are correctly identified as spam is denoted as precision (P), which is formulated as P = TP / (TP + FP).

The portion of spam emails that are successfully identified is denoted as recall (R), which is formulated as R = TP / (TP + FN).

To balance between precision and recall, we use the F-measure which is formulated as F = 2 x P x R / (P + R).

For example, when TP = 5, FP = 3, FN = 2, TN = 4, we have R = 5/7, P = 5/8, and F = 2/3. When there is no spam, we can report all emails as non-spam with F = 1.0 (perfect classifier). Our data mining team has manually analyzed several emails and labeled them as spam or non-spam.

Your task is to find the values of Low and High that yield the best classifier, i.e., the one that maximizes the F-measure.

Input

The input consists of several test cases, where each case contains two lines:

N : The maximum number of term "free" in any emails (1 ≤ N ≤ 2x106)

a0 A B M : parameters of random number generator. (2 ≤ M ≤ 10; 0 ≤ a0, A, B < M)

This random number generator generates a sequence of number:

    ai = (A * ai-1 + B) MOD M, for i >= 1

Specifying:
posi = a2i (0 ≤ i ≤ N) : the number of spam emails with i number of term "free".
negi = a2i+1 (0 ≤ i ≤ N) : the number of non-spam emails with i number of term "free".

The input is terminated by EOF.

Output

For each simulation print the F-measure of the best classifier (accurate to 6 decimal places).

Sample

Input:
3
1 1 1 3
5
2 3 4 5

Output
0.666667
0.923077

Explanation for the 1st case

This random number generator generates a sequence of 1, 2, 0, 1, 2, ... The number of spam emails is: posi = {1, 0, 2, 1}, and the number of non-spam emails is negi = {2, 1, 0, 2}.

The optimal classifier treats emails with number of term "free" between 2 and 3 as spam, with R = 3/4 and P = 3/5, resulting F = 2/3. Another way of producing optimal classifier is to consider emails with number of term "free" equals to 2 as spam.


hide comments
[Trichromatic] XilinX: 2010-10-15 16:31:06

When you see a bad HTML file and see the data range is 106 or 105, an experienced ACMer will realize that the data range is actually 10^6 or 10^5.

:D: 2010-08-12 13:10:31

Here you can find a properly formated text:
http://www.suhendry.net/contest/icpc09jak/icpc09jak-probs-publish.pdf

One very importain issue is that the limit for N is 2*10^6!

[Trichromatic] XilinX: 2010-03-26 20:10:35

Please provide the correct HTML version of this problem!


Added by:VOJ problem setters
Date:2009-10-28
Time limit:3s
Source limit:50000B
Memory limit:1536MB
Cluster: Cube (Intel G860)
Languages:All except: ASM64 C++ 4.3.2 NODEJS OBJC PERL6 SQLITE VB.NET
Resource:Problem Setter: ardiankp
ACM ICPC - Jakarta 2009