AFSK - Power Factor Sum Sum (hard)

no tags 

Here is a mixed edition of Divisor Summation Powered and Amazing Factor Sequence (medium).

The powered factor sequence

For k an integer number, we define our powered factor sequence with:

ak[0] = 0; ak[1] = 1, and

for n > 1, ak[n] = ak[n - 1] + sum({x^k | 0 < x ≤ n and n % x = 0}).

Input

First line of input contains an integer T, the number of test cases.

Each of the next T lines contains three integers n, k, m.

Output

For each test case, print ak[n] on a single line.
As the answer could be a big number, you just have to output it modulo m.

Example

Input:
3
3 1 10
4 2 55
5 3 97

Output:
8
37
43

Constraints

0 < T < 101

0 < n < 10^9

0 < k < 11

1 < m < 10^17

Numbers n, k, m are uniform-randomly chosen.
For your information, there's two input files, the first one is 'easy' with n≤100.
My (1kB)-python code get AC around 0.96s. I have a much slower basic PIKE AC (4.8s).
(Edit 2017-02-11 ; timings and TL updated after compiler changes)


hide comments
[Lakshman]: 2014-11-15 19:57:55

Finally Accepted !!!!!!!!! after such a long time.
--ans(Francky)--> Good job. ;-)

Last edit: 2014-11-15 20:29:45
Sandeep Pathry: 2013-06-21 08:46:18

@francky... got AC...
bt can you please see my code and tell will this algorithm work in C++?
and if possible... plz give me some hint to optimize my python code...
--ans--> For your C code, you have overflow issues, with your AC python one, you should be able to find them.
Thanx for hlp.. :)

Last edit: 2013-06-21 09:47:49
fR0DDY: 2013-06-19 17:30:12

Again need help in bringing it closer to at least 5 secs. It's 10 times slower than your best. :(
--ans(francky)--> You could inline some one-call functions, and put your whole code in a main function. range and xrange are distinct ! Again, you should google python opti, and try various experiments on any problem, it takes some time to find them, but it's the way to understand them very well, good luck.

Last edit: 2013-06-19 17:36:12
Thomas Dybdahl Ahle: 2013-05-07 13:09:58

I don't get it. Isn't this exactly equal to calculating G(0,n,k) in IITD4?
(Only difference is that the MOD here can vary)

Last edit: 2013-05-07 13:10:47
maaz: 2013-04-27 16:30:43

with the given constraints i must say a really good question..

[Lakshman]: 2013-04-26 09:11:08

Why WA...It should be TLE but here I am getting WA...
--ans--> It's really WA (and after correction it will be indeed TLE...) check your mod operations. eg. 4 can't be a correct answer for a mod 4 query.
edit---> Can you give me that case..I tried several test cases but all are giving correct answer for mod 4 operation.
--ans--> No need to go far to find some :
2
3 2 4
2 1 4
don't give 4 as answer, it's impossible!
edit-->Now I changed the Algorithm precomputating all n^k initially, But can't deal with n < 10^9
--ans--> It's a problem indeed, and you have to solve it. Good luck.

Last edit: 2013-04-27 10:54:11
Michael Kharitonov: 2013-03-21 10:06:43

It's brilliant! When I publish empty comment, it says "Comment too long!"
--ans--> Congratulations, I haven't finished my C translation, so I'll further tell my own C timing (lot of job...).
(edit) not finished, but bigNum computation seems slower in C than modular computation, ... sorry ; it was the opposite in Python... I'll give precise timings further.
--forw--> You use bignum arithmetic modulo 10**n or 2**n? Any comments of my solution?
--ans(francky)--> I had before a 10**n one (used on PONY6), but I'm rewriting it now in 2**n for this problem. 2**n should be the good choice, even for PONY6 (I'll try). I didn't have time to read your code, please be patient.
In any case, I can get 2.12s in Python3 with my code, it is a good 2× faster than the modular one.

Last edit: 2013-03-21 16:21:51
[Rampage] Blue.Mary: 2013-03-20 14:04:25

@Francky: Is my current C++ solution the expected one?
--ans--> It's a good solution, with a good base (quite the same for me), and as you ask in public, I'll share some possible improvements, that I didn't see in any AC for now.
1) When I said everybody felt into the 10^17 trap, I would say that according to my tests, compute the answer as a bigNum and output the answer with a single modulo operation is faster.
2) This way allows too some useful precomputations.
3) One can use the real Faulhaber polynomials, it avoids some multiplications. (<-- this trick gives very few speedup, and code is much more complex).

I hope those tips will suit to everybody, and most people will enjoy the problem and all the corners where to attack it.

Last edit: 2013-03-20 17:58:34
Ehor Nechiporenko: 2013-03-20 12:54:34

Okay, dokay, the first version was done, now I can optimize it as well. I need to find some good algo of fastest modular multiplication to make it as better as possible. Current version makes my code 10 times slower((
--ans--> Well done. FIB64 is made for that purpose...
Warning, I don't want to spoil too much, but it seems, at this time, everybody felt in the trap of 10^17 ;-) I'm a vilain.
--To Francky-->Okay, now I feel much more closer to optimal multiplication, but I understood, that it's still not the end.

Last edit: 2013-03-20 14:34:39
Francky: 2013-03-19 09:47:07

I have new timings in Python, it's easier to find new ideas (and it's not over...). Those ideas will help me to try a ultra fast C edition. Woohooo.
(edit : 2.22s with 2.22kB of python code.)

Last edit: 2013-03-18 23:45:55

Added by:Francky
Date:2013-03-17
Time limit:15s
Source limit:50000B
Memory limit:1536MB
Cluster: Cube (Intel G860)
Languages:All except: ASM64