## AFS3 - Amazing Factor Sequence (hard)

Let $s_1(n)$ be the sum of positive proper divisors of $n$.

For example, $s_1(1) = 0$, $s_1(2) = 1$ and $s_1(6) = 6$.

Let $$S(n) = \sum _{i=1}^n s_1(i).$$

Given $N$, find $S(N)$.

### Input

First line contains $T$ ($1 \le T \le 10^5$), the number of test cases.

Each of the next $T$ lines contains a single integer $N$. ($1 \le N < 2^{63}$)

### Output

For each number $N$, output a single line containing $S(N)$.

### Example

#### Input

6
1
2
3
10
100
1000000000000000000


#### Output

0
1
2
32
3249
322467033424113218863487627735401433


### Information

There are 6 Input files.

- Input #1: $1 \le N \le 10^5$, TL = 2s.

- Input #2: $1 \le T \le 60,\ 1 \le N \le 10^{15}$, TL = 10s.

- Input #3: $1 \le T \le 25,\ 1 \le N \le 10^{16}$, TL = 10s.

- Input #4: $1 \le T \le 10,\ 1 \le N \le 10^{17}$, TL = 10s.

- Input #5: $1 \le T \le 5,\ 1 \le N \le 10^{18}$, TL = 10s.

- Input #6: $1 \le T \le 2,\ 1 \le N < 2^{63}$, TL = 10s.

My C++ solution runs in about 0.85 seconds for each Input #2 - #6.

### Note

• Probably, $O(\sqrt{n})$ solutions will not pass.
• Intended solutions have a running time of about $O(n^{1/3} \log n)$.
• Time limits are somewhat strict.
• The answer can be $\ge 2^{64}$.
• DIVCNT1 is a little easier than this.

 Min_25: 2019-03-21 13:40:31 @liouzhou_101 Thank you for solving ! In fact, the function gcdext is not needed to compute the summation. (It can be computed in O(1) time with previous values.) Probably, the problem https://loj.ac/problem/6440 gives a hint. liouzhou_101: 2019-03-20 06:56:30 @Min_25 Thanks for such a wonderful problem! Learned a lot! I tried my every effort to speed up my code, but it seems still very far away from you. Could you please give some hints on speed, say, to deal carefully with constants or any new aspects/observations? @dacin21 I would say that time limits are "really" strict. I've got two TLEs due to my stupid (suboptimal) implementation with the same time complexity as my AC code, which destroys my plan to optimize after AC with suboptimal solution. dacin21: 2017-12-12 10:02:16 @Min_25 Thanks for posting such a beautiful problem, it was a pleasure to solve it. I wouldn't say that 'Time limits are somewhat strict.'. My code with suboptimal computation of sum3 was well within the time limit and I didn't optimize the constants. So thanks for setting the limit a little high, it gives gives the option to solve the problem with a suboptimal implementation and then come back to figure out the optimisations. Min_25: 2017-12-12 08:32:39 @dacin21 Thanks for solving the problem ! An improvement is that add3 (line 160) can be computed in O(1) time by merging the results (floor sums) of its parent nodes. So, floor_sum_2 and floor_sum_1 are not needed.