Generating and Analyzing
Prime Numbers
Preface
Feedback
If you wish to comment on this material in the form of
additions, detected errors or questions about the code,
send an email to primes1e12 "at" earthlink.net.
I am not a mathematician.
This project represents a programming challenge to me.
I'll be happy to discuss programming, but please direct
questions on deep mathematics to more qualified authorities.
Don Kostuch
October, 2018
Getting Started
I recently took an interest in prime numbers after
watching an episode of "Infinite Series" on YouTube. I
was (and am) fascinated by Goldbach's Conjecture:
All even numbers (greater than 2) are the sum of two
prime numbers.
No counter example has been found for values less than
4E18.
I view this not so much as an opportunity to uncover
new worlds of mathematics, as a challenge to produce
software that runs efficiently and correctly on a
typical home PC.
Requirements and Challenges
My initial work focused on finding a program structure
that ran with reasonable performance.
After a few tests I ran out of prime numbers for
testing. Although a Google search of "Prime Numbers"
produces 65 million hits, lists of more than a few
thousand primes is scarce.
I found a site that does provide primes up to one
trillion (1E12), consisting of over 3700 files.
This would have been
enough to keep me busy and satisfied for a while.
Unfortunately there were errors scattered among the
3700--files missing or flawed.
My focus switched to creating a collection of primes
suitable for my tests. I downloaded several prime
generators that proved accurate but slow. The
repair of the flawed files would take several months,
running on five computers.
I concentrated on producing 64 bit primes (the
native limit on my PC) and devising a way to efficiently
represent and store them. This would allow me to get
back to Goldbach and leave a useful package of software
for others to experiment with.
The initial sections
describe the fundamental elements of the programs and
reasons behind their design. If their purpose is not
clear at first, read on and carefully study the top
level programs and classes to see how they are used.
I use Visual Studio 12 C++ for these examples. A free
version is available from Microsoft.com. Nearly all the
code is "Standard Vanilla" C++, with heavy use of the
Standard Template Library.
To run these programs on machines other than the one running
the Visual Studio compiler, check the following
configuration setting:
Configuration Properties
C/C++
Code Generation
Runtime Library
Change "Multi-threaded DLL (/MD)"
To "Multi-threaded (/MT)".
This will considerably expand the size of
the executable, but it will not require any possibly
absent resources on the new target machine.
Some advice to simplify debugging:
Avoid complex,
deeply nested expressions. You may be unable to check
the value of intermediate expressions and may also
confuse the debugger.
Use exceptions to detect and report
Programming errors
Invariant violations
Normal "exceptional" conditions such as end of file.
Pack lots of information about the exceptional condition
into the exception object's string. It is only
created and used infrequently.
In a few places I use Windows or DOS interfaces, which
are encapsulated in functions or classes to hide the
details from the main client code. The description of
the code should make conversion to other systems
relatively easy.
Objectives
Performance and Simplicity
The following designs are meant to be "easy" to explain
and debug and provide reasonable performance and memory
usage for an interesting collection of primes (at
least up to a trillion). The code works up to the largest
64 bit prime.
I have not broken the "square root" limit, but it is at
beyond my current horizon.
A file containing a sequence of primes is named by non-prime values
preceding the first prime and following the last
prime. This avoids ambiguity in a sequence of files or other groups.
The last prime in a group will not be duplicated at the beginning
of the next group, since the specifiers are not prime.
Internal Prime Representation
For computational purposes I represent primes
as unsigned 64 bit
(unsigned long long)
integers, the largest native type on PCs. The
limit is 18,446,744,073,709,551,615 (18 Quintillion,
1.8E19). This is small potatoes compared to the REALLY
LARGE primes, but it is enough to keep me and my PC
occupied for the foreseeable future.
Primality Verification
I have compared the output of my prime generator with
numerous available prime lists and generators. I have
not detected any errors, but neither have I tried all 18
Quintillion possibilities.
File Representation
ASCI Text
Publicly available prime lists usually use ASCI
characters, with values separated by tab or newline
characters. They require no special software.
C++ handles them with the
class istream
and the extraction operator,
operator>>().
The existence and location of any given
prime candidate requires either a sequential scan
or a non-trivial binary search (the values are
not at a predicable location).
The list of primes up to a trillion (1E12) occupies
454 GB. The repetition of digit patterns allows
this to be compressed to 61 GB.
PrimeC -- Prime Compact
Using Compressed Bit Mapping
Since all primes beyond 5 end in 1, 3, 7 or 9, a sequence
of 20 potential primes can be mapped into one 8 bit byte. Each
bit represents a particular odd number and the bit
specifies if that number is prime.
The location of the byte and bit specifying any
given number is easily computed and facilitates
direct access to determine its primality. The representation
can be efficiently copied between a file and a corresponding
byte array of unsigned char.
This representation has no upper bound on the value of the
primes nor the size of a file.
Since every odd candidate is represented, the
size of the file is fixed for any interval of
the same size.
The last 1334 64 bit primes span a range of 51558--one prime
for 45 non-primes. In this range the average prime uses
under 2.5 bytes, compared to 20 characters (bytes), or 8 bytes as
a binary integer.
The approximate number of non-primes per prime is ln(x).
Decimal
Digits |
Non-primes
per prime | Bits
per Prime | PrimeC Bits |
10 | 22 | 32 | 9 |
20 | 44 | 64 | 18 |
40 | 88 | 128 | 36 |
80 | 184 | 256 | 72 |
As the size of primes grows, the length of the representation
grows at the same rate as the prime density. The primeC
representation is thus always less than the ordinary binary
representation.
The compressed decimal character representation is about 1/9 the
expanded size, and thus comparable in size to primeC, but cannot
be used directly.
A convenient file size is for a range 10 Billion (1E10)
numbers, which 500MB.
The internal details of
primec are presented in the sections on classes
PrimeCVector, PrimeCArray,
PrimeCFileWriter,
PrimeCFileReader, and
DirectoryPrimeC.
Non-Class Header Files
Prime.hpp
The file Prime.hpp
provides a common set of identifiers used in the
prime related classes and programs.
Besides reporting runtime errors, exceptions are used
to report normal ending conditions: end of file, end of
sequence, etc.