## DUKKAR2 - Huge Pascal triangle

no tags

Given P a prime number, and N an integer, Dukkar found a really fast way to compute how many numbers are divisible by P on the Nth row of the Pascal triangle. Now the task will be much harder : it's for all the N first rows.
Moreover N will be a giant number, given in base P for convenience.

### Input

The first line of input contains an integer T, the number of test cases. Follow 2×T lines.
For each test case, on the first line your are given two integers P and k.
On the second line you are given k integers : the digits of N in base P.

N = a0×P0 + ... + ai×Pi + ... + ak-1×Pk-1

### Output

For each test case, you have to print the number of numbers in all the first N rows of the Pascal triangle that are divisible by P. As the answer could not fit in a 64bit container, give your answer modulo 1000000007.

### Example

```Input:
3
5 2
0 1
5 2
1 1
7 3
2 0 2
```
```Output:
0
4
2689
```

### Explanations

For the first case, N = 0×50 + 1×51 = 5. No numbers are divisible by 5 in the first 5 rows.
For the second case, N = 1×50 + 1×51 = 6. Only 4 numbers are divisible by 5 in the first 6 rows.

```1
1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
```

For the third case, N = 2×70 + 0×71 + 2×72 = 100.

### Constraints

```0 < T < 300
0 < P < 10^9, a prime number
0 < k < 1000
0 <= ai < P,  ak-1>0
```

For your information, my 300B-python3 code get AC in 3.03s with 11MB of memory print.
My C-code get AC in 0.08s with 1.6MB of memory print.
Have fun ;-)

Edit(25/I/2015) With Cube cluster my C-time is 0.01s and my PY3.4-time is 0.26s.

Edit(11/II/2017) With compiler changes my C-time is 0.00s and my PY3.4-time is 0.12s. luc4sdreyer: 2015-09-25 21:08:07 I got it! I thought the input was most significant digit to least significant, where it's actually the other way around. The funny thing is the sample input gives the same output in both ordering styles! luc4sdreyer: 2015-09-24 23:06:19 My solution is very fast and correct for all small inputs (verified by my Dukkar1 solution), and still I get W/A even with Java BigInteger to prevent overflow. Are there any weird edge cases or things to keep in mind for large inputs? I made many C++ and Java submissions, all incorrect. My C++ code's signature is a lot like the author's: 0.09s with 3.4M memory. =(Francky)=> It's not a corner case problem, you have WA on almost all input numbers ; good luck for debug. Last edit: 2015-09-25 08:58:51