MATEX - Matrix Exponentiation

no tags 

Many problems can be solved with matrix exponentiation. This task can be 'easily' solved with a O(log(N)) algorithm. You'll have to work on the constant to get more points. We'll work with a 'small' matrix and constant modulo. This task should help you to try some speed improvements for problems like R Numbers, Grid Tiling, and the great Flibonakki...
You don't need to solve the whole input, just as many cases as you can. Not all, it could be impossible. You will get one point per case.
There will be one input file and the matrix will have a size W = 18, unlike in the sample.
For your information, my fastest 3kB-C code got 654321 points, with 31MB of memory usage. (Lot of stuff!)
(Timing edited 2017-02-11, after compiler changes)

Input

The first line contains an integer number W : the size of the squared matrix.
The W next lines contains W integers Mij : the coefficients of the matrix, line by line. In each the 838383 next lines there are three integers I, J, N.
You don't have to read the whole input, just as many as you can solve...

Output

For as many test cases you can, on a single line, print the coefficient on the Ith line, Jth column of the matrix M^N (M to the power N). As the answer could be a big number, output the answer modulo 1000000007.

Example

Input:
5
32 82 7 66 30
57 1 28 2 89
53 66 6 35 61
45 87 88 24 20
35 22 23 80 93
1 1 1
1 5 2
5 1 3
5 5 99
[...]

Output:
32
12795
2714937
764817976

Explanations

For the first case: M^1 =

32 82  7 66 30
57  1 28  2 89
53 66  6 35 61
45 87 88 24 20
35 22 23 80 93

For the second case: M^2=

10089  9570  9060  6505 12795
 6570  8655  2818 11912 11824
 9486  9195  6738  9560 14203
12843 12113  5851  8400 16801
10448 13416 10178 12519 14660

For the third case: M^3=

2089068 2282253 1259668 2181834 3027095
1802809 2029855 1625446 1781368 2477165
2112086 2375941 1532239 2245976 3026032
2377555 2551827 1589794 2622329 3550751
2714937 2953573 1948704 2545886 3742082

Constraints

W = 18
1 ≤ I,J ≤ W
1 ≤ Mij ≤ 10^9
1 ≤ N ≤ 10^18

Uniform-random input in the range.

Score

As in the example, if you can output the 4 first correct answers, your score will be 4 points. No need to solve all the input, the minimum is 1 ; every solver in 'any' language will be able to check his MATrix-EXponentiation-speed.

Edit(12/II/2017) Now the MasterJudge takes time into account if you reach the limit of input file.


hide comments
dhruvgheewala: 2020-03-28 11:47:12

Not getting any points, getting TLE, don't know why?

=(Francky)=> Try to solve only few lines, not all, you might get TLE (Time Limit Exceeded). You will get as many points as you can solve lines.

Last edit: 2020-03-28 12:25:39
Francky: 2019-12-29 10:42:34

@Curiosa : EB members (nor problem setter) can't rejudge a whole problem solutions, but admins can do that. If you think we need a new ranking for some problems, I would suggest not to rejudge some problems ; how to choose them ? Rejudge all ; way too long, and last experiment (pyramid -> cube) led to interpolations for many subs.
I would recommend to set a new problem, with new harder constraints ; a new challenge.
Many thanks for your appreciation.

Min_25: 2019-12-29 03:16:40

It seems that the judge machines have been switched from "Intel(R) Xeon(R) CPU E3-1220 v5 @ 3.00GHz" (Skylake) to "Intel(R) Xeon(R) CPU E3-1270 V2 @ 3.50GHz" (Ivy Bridge).

Curiosa: 2019-12-28 17:31:58

By any chance, can we rejudge solutions? Not suggesting that the top ones are not superior, it's just that the solution that I had in 2018 that was solving maximal case in 0.74, now was running at 0.92; There might have been some characteristics that changed on the judge machine (especially IO, maybe cache size?). Let me know what you think about this Francky; Also happy Christmas holidays! and thanks for this amazing problem.

Martin Radev: 2016-11-07 23:56:58

Pretty hard to get to the top. I reached 140000 pts. I guess I need a faster way to read and write io.
I tried using compiler hints to use 128 bit simd, but did not see any speed up. Might be good to write the intrinsics directly and try 256bit. Does any of the top solutions use something like that?

=(Francky)=> I will not reveal any clue about top solutions. You've yet found many ways to speed up your code ; congrats again.

Last edit: 2016-11-08 07:33:13
Martin Radev: 2016-11-01 10:54:15

I really wonder how one has managed to get >100000 points on this one.
I tried getting to sqrt(logN) per query with some precomputation and while it gives some points (~2000), it is still super far.
I tried whenever possible to use the inner product (?) of the matrices. I.e. instead of A.B and doing the multiplication standardly, I store B^T and do the inner product -> row by row. This should be cache friendly.
Tried unrolling the inner loop of the multiplication and this led to 3200 points.
I wonder whether any of the top solutions use SIMD and run on multiple threads. Actually, is this even possible on spoj?


=(Francky)=> You will find new angle to attack this problem... and gets more points. @spoj we can use only one single thread, there no cheat. You can use cache friendly methods, unroll some loops, ... But to get more points is here (as almost always) in the method. Good luck. You yet managed to go >50k points ; congrats for that.

Last edit: 2016-11-03 00:12:48

Added by:Francky
Date:2013-02-27
Time limit:1s
Source limit:50000B
Memory limit:1536MB
Cluster: Cube (Intel G860)
Languages:All except: ASM64
Resource:Own problem