Saturday 3 August 2024

Probability of Reaching a Home Prime

Looking at the 51 steps required for 20240802 to reach a home prime (the subject of my previous post titled A Long Way Home), I fell to wondering about the probability of a composite number forming a prime when its prime factors are concatenated in ascending order, or any order for that matter. The key point is that the prime factors must all end in 1, 3, 7 or 9 and so the numbers that are formed from the concatenation must also end in these same digits. This is the first fact that we must bear in mind.

Another fact is that the likelihood that a randomly chosen number less than \(n\) is prime is (approximately) inversely proportional to the number of digits in \(n\) (source). If we look at the growth in the length of the concatenated numbers that arise from 20240802, we see that starting with eight digits we end up with a prime of 103 digits after 51 steps. This is very close to an increase of two digits per concatenation. As the concatenated numbers get larger, the likelihood of a prime arising decreases in proportion to the length of the number. Armed with these facts, we can set about finding for any starting number the average length of resulting home primes and the average number of steps required to reach them.

Let's start with eight digit composite numbers formed from the dates of the year such as 20,240,802 that arises from the 2nd August 2024. The probality of finding a prime with numbers of this length is 0.05944 but once we reach the home prime of 20240802 with its 103 digits this density has reduced to 0.00422. However, the numbers that are forming from the concatenation all end in 1, 3, 7 or 9 and this considerably boosts the probability of a prime being formed. The 60% of numbers ending in 0, 2, 4, 5, 6 or 8 simply cannot be formed from this type of concatenation. So instead of 0.05944 out of 1, we have 0.05944 out of 0.4 and instead of 0.00422 out of 1, we have 0.00422 out of 0.4 as well. This translates to about 0.1486 and 0.01055.


Matt Parker giving a nod to the taxicab number
in a YouTube video on concatenation: link

So starting with eight digit numbers we have about a 15% chance of forming a prime but by the time we reach 103 digit numbers that chance has plummeted to 1%. So every two digit increase in the length of a number decreases the probability of a prime forming by about 0.28%. Conversely, the chance of a prime NOT forming starts at 85% for an eight digit numbers and increases to 99% for 103 digit numbers. This is an increase of about 0.28% for every one of the approximately 50 steps in the concatenation process. So even though the chances of a prime NOT forming at each step in quite high and increasingly slowly, the chances of a prime NOT forming from successive concatenations quickly becomes quite low. The progression is as follows (rounding to two decimal places):

  • 0.85 to start with
  • 0.72 for the first concatenation
  • 0.53 for two concatenations
  • 0.28 for three concatenations
  • 0.08 for four concatenations
  • 0.01 for five concatenations
  • 0.00 for six concatenations and beyond
The mean is 0.05 so a prime should form between four and five concatenations on average. Clearly though there's more going on than is at first apparent here given that there are 471 numbers less than 10000 for which there are no known home primes with 49 being the first. It would seem likely that there will be many composite numbers without home primes amongst the numbers generated from the daily dates. 

Let's look at the some examples:
  • 20240801 --> 2 steps
  • 20240802 --> 51 steps
  • 20240803 --> 3 steps
  • 20240804 --> unknown
  • 20240805 --> 6 steps
  • 20240806 --> 1 step
  • 20240807 --> prime
  • 20240808 --> 8 steps
  • 20240809 --> 13 steps
  • 20240810 --> 11 steps
  • 20240811 --> 2 steps
  • 20240812 --> 13 steps
  • 20240813 --> 8 steps
  • 20240814 --> 14 steps
  • 20240815 --> unknown
  • 20240816 --> 25 steps
  • 20240817 --> 2 steps
  • 20240818 --> 5 steps
  • 20240819 --> prime
  • 20240820 --> 6 steps
If we ignore the primes, let the two unknowns be nominally 100 and put these results in ascending order, we get 1, 2, 2, 2, 3, 5, 6, 6, 8, 8, 11, 13, 13, 14, 25, 51, 100, 100 which has a median of 8. If we ignore the 51, 100, 100 then we have 1, 2, 2, 2, 3, 5, 6, 6, 8, 8, 11, 13, 13, 14, 25 which has median of 6. This is just a small sample and I'll need to research this topic further.

ADDENDUM

Concatenation of numbers can be achieved in Python and SageMath in two ways, one not involving strings (Method 1) and the other using strings (Method 2). I use the second method it's just simpler to remember. The code for both is as follows (permalink):

# Method 1
base=10
number1=17
number2=29
number=number1*10^(floor(log(floor(number2),base)+1))+number2
print(number)
# Method 2
number=int(str(number1)+str(number2))
print(number)

1729
1729

No comments:

Post a Comment