Pell Equations with Algorithms

While doing my math/CS undergraduate degree at the University of Alberta, I presented at the Seminar in Number Theory for Alberta Students on solving Pell equations using continued fractions. I also presented an O(n) method for finding the fundamental solution to the Pell equation \(x^2-dy^2=1\) during my talk. In this blog post, I’m going to present the essential parts of my talk on how this O(n) method works, show the results of it for up to \(d=99\), and also analyze the negative Pell equation \(x^2-dy^2=-1\).

As a note, the first several definitions are all rewritten from David Burton’s Elementary Number Theory, which greatly inspired this post. Burton’s text is one of the most well-written and comprehensive number theory books I know, and I strongly recommend it.

Definition: A finite continued fraction is of the form $$a_0+\frac{1}{a_1+\frac{1}{a_2+\frac{1}{a_3+\frac{1}{\ddots+\frac{1}{a_{n-1}+\frac{1}{a_n}}}}}},$$ where the \(a_i\)’s are positive real numbers, except possibly \(a_0\) which may be \(0\). The fraction is called simple if all the \(a_i\) are integers. We often denote such a fraction as \([a_0; a_1, \cdots, a_n]\)

For instance, $$[1; 2, 3, 4]=1+\frac{1}{2+\frac{1}{3+\frac{1}{4}}}=1+\frac{1}{2+\frac{1}{\frac{13}{4}}}=1+\frac{1}{2+\frac{4}{13}}=\cdots=\frac{43}{30}.$$

Definition: Given a continued fraction \([a_0; a_1, \cdots, a_n]\), we define the convergent \(C_k\) that is obtained by cutting off the fraction after \(a_k\) to be \(C_k=[a_0; a_1, \cdots, a_k], 1\le k\le n.\) By definition, \(C_0=a_0\).


Example: We can see that the continued fraction for \(\phi=\frac{1+\sqrt{5}}{2}\) is given by \(\phi=[1; 1, 1, \cdots]\). To see this, observe that \(\phi\) satisfies the equation \(\phi=1+\frac{1}{\phi}=1+\frac{1}{1+\frac{1}{\phi}}=\cdots\).


If we compute the \( k^{\mbox{th}}\) convergent of the continued fraction for \( \phi\), we get the Fibonacci numbers! Observe that \( C_0=\frac{1}{1}, C_1=\frac{2}{1}, C_2=1+\frac{1}{1+\frac{1}{1}}=\frac{3}{2}, C_3=1+\frac{2}{3}=\frac53\), and so forth. In general, we claim that \(C_k=\frac{F_{k+2}}{F_{k+1}}\), where \(F_1=F_2=1\) and \(F_n=F_{n-1}+F_{n-2}\) for \(n\ge 2\). To prove this, the base cases are given above. If it holds for \(k\ge 1\), then $$C_{k+1}=1+\frac{1}{C_k}=1+\frac{F_{k+1}}{F_{k+2}}=\frac{F_{k+2}+F_{k+1}}{F_{k+2}}=\frac{F_{k+3}}{F_{k+2}},$$ hence the pattern continues by induction.

This is related to the result \(\phi=\lim_{n\to \infty}\frac{F_{n+1}}{F_n}\).

Theorem: Given a continued fraction \([a_0; a_1, \cdots, a_n]\), we define the recurrence relationships given by \(p_0=a_0, q_0=1, p_1=a_1a_0+1, q_1=a_1\), and in general for \(2\le k\le n, p_k=a_kp_{k-1}+p_{k-2}, q_k=a_kq_{k-1}+q_{k-2}.\) Then the \(latex k^{\mbox{th}}\) convergent is given by \(C_k=\frac{p_k}{q_k}\).


Example: Consider \(\phi=[1; 1, 1, \cdots]\). Since \(a_i=1\) for every \(i\), we have \(p_0=1, q_0=1, p_1=2, q_1=1\), and in general \(p_k=p_{k-1}+p_{k-2}\) and \(q_k=q_{k-1}+q_{k-2}\). Thus \(p_k=F_{k+2}\) and \(q_k=F_{k+1}\) and we see \(C_k=\frac{p_k}{q_k}=\frac{F_{k+2}}{F_{k+1}}\) as before.


Proof: We see that \(C_0=a_0=\frac{p_0}{q_0}, C_1=a_0+\frac{1}{a_1}=\frac{a_1a_0+1}{a_1}=\frac{p_1}{q_1}\), and \(C_2=a_0+\frac{1}{a_1+\frac{1}{a_2}}=a_0+\frac{a_2}{a_1a_2+1}=\frac{a_0a_1a_2+a_0+a_2}{a_1a_2+1}=\frac{a_2(a_0a_1+1)+a_0}{a_1a_2+1}=\frac{p_2}{q_2}.\)
In general, assume the statement holds for every value \(2\le k\le m\), and we show it holds for \(k=m+1\). Observe that since \( p_{m-1}, q_{m-1}, p_{m-2}, q_{m-2}\) are independent of \(a_m\), we can write

Therefore \(p_{m+1}=a_{m+1}p_m+p_{m-1}\) and \(q_{m+1}=a_{m+1}q_m+q_{m-1}\) as desired.

Definition: In an infinite convergent sequence, if we have a block that repeats over and over, we then write \([a_0; a_1, \cdots, a_m, b_1, \cdots, b_n, b_1, \cdots, b_n, \cdots]=[a_0; a_1, \cdots, a_m, \overline{b_1, b_2, \cdots, b_n}]\).

Example: Convert \(\sqrt{3}\) to a continued fraction.

Solution: Let \sqrt{3}=1+\dfrac{1}{a_1+\frac{1}{a_2+\frac{1}{a_3+\cdots}}}. Subtracting \(1\) from both sides gives us \sqrt{3}-1=\dfrac{1}{a_1+\frac{1}{a_2+\frac{1}{a_3+\cdots}}}.

Taking the reciprocal of the equation gives us \dfrac{1}{\sqrt{3}-1}=\dfrac{\sqrt{3}+1}{2}=a_1+\frac{1}{a_2+\frac{1}{a_3+\cdots}}. We then see that a_1=1 since \sqrt{3}\approx 1.73. Now, substituting this into the equation and subtracting gives:

\dfrac{\sqrt{3}-1}{2}=\frac{1}{a_2+\frac{1}{a_3+\cdots}}\implies \dfrac{2}{\sqrt{3}-1}=\sqrt{3}+1=a_2+\dfrac{1}{a_3+\cdots}. Thus a_2=2 and we obtain:

\sqrt{3}-1=\dfrac{1}{a_3+\cdots}. But notice from this point on that it repeats from the original equation! Therefore, a_3=1, a_4=2, and in general \sqrt{3}=[1;\overline{1,2}].

Observe that the first convergent of \sqrt{3} is C_1 = \frac{p_1}{q_1}=\frac{2}{1}. Furthermore, (p_1, q_1)=(2,1) satisfy the equation x^2-3y^2=1. However, notice that the period length of \sqrt{3} is even and the negative Pell equation x^2-3y^2=-1 has no solution since -1 is not a quadratic residue mod 3. These results generalize as follows:

Theorem: If the period length, n, of the simple continued fraction for \sqrt{d} is even, then the fundamental (that is, the smallest) solution to the Pell equation x^2-dy^2=1 is given by the convergent (x,y)=(p_{n-1}, q_{n-1}). Furthermore, then the negative Pell equation x^2-dy^2=-1 has no solution.

Exercise: Compute the continued fraction for \sqrt{18}. Use this to compute the fundamental solution to the Pell equation x^2-18y^2=1 and to show that the negative Pell equation x^2-18y^2=-1 has no solutions.

Now, we’ll examine the case where the period length of the continued fraction is odd. In particular, continue the continued fraction for \sqrt{5}. We see that:

\sqrt{5}=2+\frac{1}{a_1+\frac{1}{a_2+\cdots}}\implies \sqrt{5}-2=\frac{1}{a_1+\frac{1}{a_2+\cdots}}\implies \frac{1}{\sqrt{5}-2}=a_1+\frac{1}{a_2+\cdots}.

Simplifying the left-hand side by conjugating the denominator, we get \sqrt{5}+2. Therefore, a_1=4 and we’re left with \sqrt{5}-2=\frac{1}{a_2+\cdots}. Notice this repeats, hence a_2=a_3=\cdots=4 and we have \sqrt{5}=[2;\overline{4}].

The zeroth convergent of this continued fraction is C_0=\frac{p_1}{q_1}=2\implies p_0=2, q_0=1. The first convergent of this continued fraction is C_1=2+\frac{1}{4}=\frac{9}{4}=\frac{p_1}{q_1}\implies p_1=9, q_1=4. Observe that (p_0, q_0)=(2,1) satisfy the negative Pell equation x^2-5y^2=-1 and that (p_1, q_1)=(9,4) satisfy the Pell equation x^2-5y^2=1.

This result also generalizes as follows:

Theorem: If the period length, n, of the simple continued fraction for \sqrt{d} is odd, then the fundamental solution to the Pell equation x^2-dy^2=1 is given by the convergent (x,y)=(p_{2n-1}, q_{2n-1}). Furthermore, then the negative Pell equation x^2-dy^2=-1 has a fundamental solution given by (x,y)=(p_{n-1}, q_{n-1}).

We can now write a computer program that solves Pell equations by combining these two theorems. Assume that we have a dictionary of continued fractions for each integer \sqrt{d} held in the dictionary continued_fracs. Then the code is:

for d in continued_fracs.keys():
  a = continued_fracs[d]
  n = len(a)-1
  p = [a[0], a[0]*a[1]+1]
  q = [1, a[1]]
  if(n%2 == 0):
    for k in range(2, n):
      p.append(a[k]*p[k-1]+p[k-2])
      q.append(a[k]*q[k-1]+q[k-2])
    print("Fundamental solution to x^2-dy^2=1 for d=%d is x=%d, %d" %(d, p[n-1], q[n-1]))
    print("No solution to negative Pell equation x^2-dy^2=-1 for d=%d" %(d))
else:
  end_a = a[1:]
  a += end_a
  for k in range(2, 2*n):
    p.append(a[k]*p[k-1]+p[k-2])
    q.append(a[k]*q[k-1]+q[k-2])
  print("Fundamental solution to x^2-dy^2=1 for d=%d is x=%d, y=%d:"%(d, p[2*n-1], q[2*n-1]))
  print("Fundamental solution to x^2-dy^2=-1 for d=%d is x=%d, y=%d"%(d, p[n-1], q[n-1]))

In lines 2 through 5, we compute the length of the continued fraction’s period length and the initial values of the recursive sequences p_0, p_1, q_0, q_1. In lines 6 through 11, if the period length is even, we compute each of the values of p_k, q_k according to the recursive definition and then print out the fundamental solution to the Pell equation.

In lines 13 through 19, if the period length is odd, we have to compute up to p_{2n-1}. Hence, we replicate the repeating block of the continued fraction in lines 13 and 14, before computing the values recursively as before in lines 16 and 17. We then print the fundamental solution to the Pell equation in line 18 and the negative Pell equation in line 19.

The full Python file can be found on GitHub, and the output of the code can also be found here. The output for up to d=10 is also printed below in the table:

Value of d Solution to Pell EquationSolution to Negative Pell Equation
2(3,2)(1,1)
3 (2,1)None
5 (9,4) (2,1)
6(5,2) None
7 (8,3) None
8 (3,1)None
10 (19,6)(3,1)

Large Language Model Exploration:
If you’ve made it this far, you may wonder how large language models would do in solving the negative Pell equation x^2-3y^2=-1. I experimented with this question a bit on https://lmarena.ai/ and obtained the following two answers:

The one on the left is a complete fabrication, while Grok-2 impressively solves it correctly. If you experiment with other large language models, I’d be curious to see which ones can and can’t solve questions on negative Pell equations correctly.

References:

  1. David Burton, “Elementary Number Theory”, 7th Edition, 2012.
  2. W. Sierpinski, “Elementary Theory of Numbers”, 1988

A Pythagorean Triple for 2025

Happy 2025 to everyone! As one exciting update, I plan to work more on AI for math education this year, including more work in number theory. I hope to publish some updates to my number theory book with more completed chapters if all goes well.

The following puzzle was posed to me in an email by Professor Paul Buckingham. Paul has been a big inspiration to me for a while for his mentoring of me in Algebraic Number Theory, so I knew it’d be a good problem. It led to some twists and turns in number theory and computer science, so I’m sharing it below with his permission.

Problem: Show that there are pairwise coprime positive integers x,y, and z such that: x^2+y^2=z^2 and \dfrac{x}{x+y+z}=\dfrac{1}{2}-\dfrac{1}{2025}.

Solution: We must consider two cases based on whether x is odd or even. In the first case, we know that the primitive Pythagorean triple can be generated by (x,y,z) = (m^2-n^2, 2mn, m^2+n^2) where m and n are pairwise relatively prime and have opposite parity. If you’re unfamiliar with this formula, please see my notes here.

Substituting these values into the equation gives us:

 \frac{x}{x+y+z} = \frac{m^2-n^2}{(m^2-n^2)+2mn+(m^2+n^2)}=\frac{(m-n)(m+n)}{2m^2+2mn}=\frac{(m-n)(m+n)}{2m(m+n)}=\frac{m-n}{2m}=\frac{1}{2}-\frac{n}{2m}

Therefore, we have that \dfrac{n}{2m}=\dfrac{1}{2025}. The only coprime positive integers that satisfy this are n=2 and m=2025. These generate the solution triplet (x,y,z)=(4100621, 8100, 4100629).

In the second case when x is even, we arrive at the equation:

\dfrac{x}{x+y+z} = \dfrac{2mn}{2m^2+2mn} = \dfrac{n}{m+n} = \dfrac{1}{2}-\dfrac{1}{2025}=\dfrac{2023}{4050}.

Therefore, we have that n=2023 and m=2027. However, this gives a non-primitive solution since n and m are the same parity (it actually gives double the solution found above).

Hence, the only primitive Pythagorean triple that satisfies this equation is (x,y,z)=\boxed{(4100621, 8100, 4100629)}.

I then shared this puzzle with the professor who got me into computer science, Dr. Richard Kelley. He was recently learning the Lean Proof Assistant and decided it’d be a fun exercise to code up the solution in Lean. Below is a screenshot of the code he wrote, showing how the proof assistant checks the math to ensure the solution is valid.

-- Code written by Richard Kelley
import Mathlib.NumberTheory.PythagoreanTriples
def x : ℤ := 4100621
def y : ℤ := 8100
def z : ℤ := 4100629

def FractionalEq (x y z : ℚ) : Prop :=
( x / (x + y + z) ) = ( (1/2) - (1/2025) )

-- The coercion from ℤ to ℚ is necessary to be able to use the ring_nf tactic.
example : FractionalEq (x : ℚ) (y : ℚ) (z : ℚ) := by
rw[x,y,z]
rw[FractionalEq]
ring_nf

example : (PythagoreanTriple x y z) ∧ (FractionalEq (x:ℚ) (y:ℚ) (z:ℚ)) := by
rw[x,y,z]
constructor
case left =>
rw[PythagoreanTriple]
simp
case right =>
rw[FractionalEq]
ring_nf

Extension of the problem:

Can a computer search to find the primitive Pythagorean triple we found to this? My suspicion is no, unless we’re able to put tight bounds on x,y,z. Since my master’s thesis used a lot of logic programming, I attempted to program it in Clingo (a language for answer set programming). Unfortunately, my code failed to return a solution in a reasonable amount of time. If anyone has suggestions for how to get a computer search to find this solution, I’d love to hear them in the comments below!


pythagoreanTriple(X,Y,Z) :- X=1..n, Y=1..X, Z=X..n, (X**2)+(Y**2)=(Z**2).
divisionCondition(X,Y,Z) :- X=1..n, Y=1..X, Z=X..n, X/(X+Y+Z) = 1/2 - 1/2025.
satisfiesTriple(X,Y,Z) :- X=1..n, Y=1..X, Z=X..n, pythagoreanTriple(X,Y,Z), divisionCondition(X,Y,Z).
#show satisfiesTriple/3.

Happy year 2025 = 45^2 = (20+25)^2 = \displaystyle \sum_{n=1}^{9}n^3 to you all.

Social Distancing at the Bar Riddle

I came across the following puzzle yesterday (edited for relevancy):

There is a bar with 25 seats in a line. People are social-distancing so when they walk into the bar, they always try to find a seat farthest away from others. To comply with government restrictions, no two people can sit next to each other. If a person walks in and there are no available seats, they will leave. Where should the owner tell his first customer to sit in order to maximize the number of people in the bar?

In order to solve this problem, we can utilize some wishful thinking. The best-case scenario is when every person is seated exactly two away from everyone else:

1, 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25.

How can we produce such a sequence? Well, if we had every fourth person, then the other people can sit in-between them (while still maintaining a distance of 2):

1, 5, 9, 13, 17, 21, 25.

To obtain this configuration, we take every eighth person, so the remaining people can sit a distance of four away from everyone else:

1, 9, 17, 25.

Finally, to obtain this configuration, we either need to put the first person on seat 9 or 17. We used backtracking in order to determine the best starting seat.

For a visual solution, I created a MatLab program that would test out each starting location of the first customer and then generates how the other customers would fill out the bar.  For the original puzzle, this creates the following graph:

maxconfig_25_2

The y-axis is the location of the first customer seated, while the x-axis shows the seating configuration according to the bars’ rules. The starred values indicate the optimal configuration, which occurs exactly when the first person is seated at 9. The relative size of each Polkadot is the order the customer is seated in the configuration, which is why the line y=x has the largest dots.

In this case, we were able to seat \lceil \frac{25}{2}\rceil=13 people effectively. Can we always obtain this bound for a bar with n seats?

Unfortunately, the answer is no, but this isn’t obvious to see.  Plotting the number of seats against the maximum number of people seated gives the following graph:

MaxSeated

The stars indicate when the maximum number of people seated equals \lceil \frac{n}{2}\rceil. It’s interesting to note the outliers here are when n=15, 23, 27, 28, 29, 30, 31, 39.

Let’s take one of these specific values and see what goes wrong in our backtracking approach. For n=15, the best possible scenario would be:

1, 3, 5, 7, 9, 11, 13, 15.

In order to obtain this, we take every fourth person as we did before:

1, 5, 9, 13.

To obtain this, we suspect we should seat the first customer at 9. However, plotting this shows this generates a different graph:

maxconfig_15_2

The sequence starting at 9 is 1, 3, 5, 7, 9, 12, 15. Why did we skip over 13? This is because our backtracking approach didn’t account for the selection mechanism.

Starting with 9, the second person selected will sit at 1. After this, the next person will sit at 15, then 5 to create the current configuration: 1, 5, 9, 15. The next person entering the bar will want to sit as far away from the other people as possible. Scanning all the seats, they’ll choose seat 12 since it’s 3 away from both 9 and 15. Therefore, 15 is a suboptimal starting value.

We noticed every value before this was optimal, which leads us to wonder if we can characterize all of the values of n that give an optimal configuration.

Expanding the search up to n=100 gives the following values that are optimal:

n=1 2 3 4 5 6 7 8 9 10 11 12 13 14 16 17 18 19 20 21 22 24 25 26 32 33 34 35 36 37 38 40 41 42 48 49 50 64 65 66 67 68 69 70 72 73 74 80 81 82 96 97 98.

I found that every number in the list above is of one of the forms

2^a, 2^a+2^b, 2^a+2^b+1, 2^a+2^b+2,

where a and b are non-negative integers. This leads me to conjecture that these forms contain all the initial values of n that produce an optimal configuration, however, this may be a case of The Strong Law of Small Numbers at work.

If how closely everyone is seated together makes you a bit uneasy in the graphs, we can expand the gap between people. For a fixed number of people in the bar, n=25, we vary the gap size, g to produce the following seating configurations:

The starred values indicate the best seat configuration. Compare these with the graph of gap size 2 earlier. Notice the best first seat varies between 1 and 9.

We can now plot the number of people seated compared to gap size:

gapsize25

A value is starred if the maximum number of people seated equals \lceil \frac{n}{g} \rceil. In this case, not only is length 25 optimal for a gap size of 2, but it’s optimal for every gap!

We wonder when this pattern will remain. Of the values of n above that were found to be optimal for a gap size of 2, the following are optimal for every gap size:

n=1 2 3 4 5 6 7 8 9 10 11 12 13 14 16 17 18 19 20 21 24 25 26 32 33 48 49.

An example of a value of n which gives suboptimal gap sizes is n=68:

gapsize68

The gap sizes of 3, 6, 7, and 9 are all suboptimal. This is the same for n=69 and 70. Similarly, for n=80, 81, and 82, we have suboptimal gap sizes of 3, 6, 7, and 11.

Since the gap size of 3 seems to produce suboptimal behaviour for several values, we plot the number of seats compared to the maximum number of seats for g=3:

gapsize3half

The starred values indicate when the max number of people seated equals \lceil \frac{n}{3}\rceil. We see the initial part is a step function in threes, similar to g=2.

gapsize3secondhalf

There is also a long valley around n=70 and n=80, as predicted above.

The full list of values of n that are optimal for g=3 are given below:

n=1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 24 25 26 27 28 29 30 31 32 33 36 37 38 39 48 49 50 51 52 53 54 55 56 57 60 61 62 63 72 73 74 75 96 97 98 99 100.

A few closing thoughts to the reader:

  • Can we characterize the values of n above similar to what we did for g=2?
  • Are any of these sequences on the Online Encyclopedia of Integer Sequences?
  • Has a full mathematical analysis been conducted? I’ve seen the XKCD Urinal Protocol Vulnerability, but this assumes the first person sits on the end.
  • Stay safe, and continue to practice social distancing everywhere you go!