RAND(Squawk): Technical problem to solve - Function over range using sum of digits.

This apparently simple problem has been fascinating me for a long while now. The question is

For all the numbers n between 0 and 1E18 that's 1*10^18 or 1 with 17 zeroes how many have the property :

Sd(n) = Sd(137 * n)

where the function Sd( ) or sd ( ) is the arithmetic sum of the digits of the number in brackets.

For examples

Sd(81) = 8+1 = 9 and Sd(81*137) = Sd(11097) = 18 Function is Not true
Sd(369)= 3+6+9 = 18 and Sd(369*137) = Sd(50553) = 18 Function is True = *Bingo*

Going directly to the answer by counting from 0 to 1E18 ( 1 with 18 "0"s after) and working out which of the numbers that have the property is possible but not really feasible. Using 1000 processors each working in parallel on a separate range with 10^9 numbers would take an hour or so once set up. GPUs using CUDA would provide such compute resources at an affordable price but do require specialist programming. However brute forcing the answer is not usually the best way. A hint can be taken from the question that there is an alternative method to solve this problem. The question asks “how many” rather than “which numbers” have the property. This opens the possibility that the quantity of numbers can be found without looking at every possible candidate in the range.

By the way 137 is a prime number but not sure if that is relevant.

The term "Bingo" indicates a number that has the property.

Analysis Step 1 - Very approximate answer

Working out the function for the a few ranges of numbers is not so hard and gives these results:

Analysis is started by using a sample of numbers 0..999999 to see if an approximate answer can be determined statically. When starting the count in the range 0..10 we find 0 is a bit of an exception but does have the property

sd(0) = sd(137*0) as 0 = 0,
also sd(9) = as sd(137*9) = 1+2+3+3 = 9

Looking in larger ranges

After 0 the occurrence of the property is less predictable but a possible clue is found. The trend is towards their being about 2.7 % of numbers having the property.

Analysis Step 2 - Make a distribution picture

Creating spreadsheet that compares/counts the results of sd(n) along the top and sd(137 * n) along the rows we can pick out (in red boxes ) and count where sd(n) = sd(137 * n)
Contents are coloured by number of occurrence between 1..1,000,000

To see how the sd() function works in Excel - see nearby blog entry.

From this distribution picture we can confirm that the combined cells with the property sd(n) = sd(137n) have 27955 entries. We can also see that the cells where sd(n) = sd(137n) are all in columns sd(n) and rows sd(137*n) that are divisible by 9 ie: 9, 19, 27, 36, 45, 54.

Analysis Step 3 - Look for patterns in the reachable answers

Generating a script to output the numbers and find the occurrences helped check for patterns.

The columns of this output are

n - The number

%06d - the number using 6 digits with leading zeros

137*n - The number * 137

sd(n) - Sum_of_digits in n

sd(137n) - Sum_of_digits in n * 137

n%9==0 - does n Modulo 9 = 0, Modulo '%' is the remainder after n is divided by another number. Examples 5 % 2 = 1 and 99 % 9 = 0.

sd(n) == sd(137*n) - Is the property true for this value of n y/n

Output spacing and colouring adjusted for clarity.

#n, %06d, 137*n, %06d, sd(n),sd(137n),n%9==0?, sd(n)==sd(137n)?, !

0,000000,0, 000000,0, 0,y,y,!

1,000001,137, 000137,1,11,n,n,!

2,000002,274, 000274,2,13,n,n,!

3,000003,411, 000411,3, 6,n,n,!

4,000004,548, 000548,4,17,n,n,!

5,000005,685, 000685,5,19,n,n,!

6,000006,822, 000822,6,12,n,n,!

7,000007,959, 000959,7,23,n,n,!

8,000008,1096,001096,8,16,n,n,!

...

87,000087,11919,011919,15,21,n,n,!

88,000088,12056,012056,16,14,n,n,!

89,000089,12193,012193,17,16,n,n,!

90,000090,12330,012330, 9, 9,y,y,!

91,000091,12467,012467,10,20,n,n,!

92,000092,12604,012604,11,13,n,n,!

...

98,000098, 13426,013426,17,16,n,n,!

99,000099, 13563,013563,18,18,y,y,!

100,000100,13700,013700, 1,11,n,n,!

101,000101,13837,013837, 2,22,n,n,!

Takes about 10 seconds to run the first 1 million numbers. Looking at just the numbers where the property is true enables an apparent breakthrough. Notice that the the last two y/n fields are both set if the number % 9 = 0 and if the sum_of_digits property is true.

$ perl run_cols.pl | grep 'y,!' | head

0,000000,0,000000,0,0,y,y,!

9,000009,1233,001233,9,9,y,y,!

90,000090,12330,012330,9,9,y,y,!

99,000099,13563,013563,18,18,y,y,!

198,000198,27126,027126,18,18,y,y,!

279,000279,38223,038223,18,18,y,y,!

369,000369,50553,050553,18,18,y,y,!

387,000387,53019,053019,18,18,y,y,!

396,000396,54252,054252,18,18,y,y,!

…..

22653,022653,3103461,3103461,18,18,y,y,!

22680,022680,3107160,3107160,18,18,y,y,!

22689,022689,3108393,3108393,27,27,y,y,!

22698,022698,3109626,3109626,27,27,y,y,!

22716,022716,3112092,3112092,18,18,y,y,!

It was seen that every number that had the sum_of_digits property also had the n % 9 = 0 property. The last two y/n columns show this. Unfortunately not all numbers that have the n % 9 = 0 have the sum_of_digits property.

1365 history

$ perl run_cols.pl | grep 'n,n' | wc -l

888888

$ perl run_cols.pl | grep 'y,n' | wc -l

83157

$ perl run_cols.pl | grep 'y,y' | wc -l

27955

$ perl run_cols.pl | grep 'n,y' | wc -l

888888+83157+27955 = 1000000 - All lines 0..999999 accounted for

888888 numbers have neither of sum_of_digits property or n % 9 =0

83157 numbers have %9=0 but not the sum_of_digits property.

27955 have both %9=0 and the sum_of_digits property.

0 numbers that have %9=0 do not have sum_of_digits property.

This confirms a hint where by only the numbers that have the %9=0 property need to be checked for the sd(n) = sd(137n) property. %9=0 represents 10 percent of the numbers in any given number range.

However there still remains the need to check 1*E18 numbers.

Analysis Step 4 - Split and conqure

Here is where things get interesting and just a bit more complex. How do we scale up to be able to get the answer across the whole target range of 0 to 1E18 ? Any number x that starts with 1 and is only followed by 0s (eg. 10, 1000, 1000000000 ) would not be a Bingo as x % 9 = 1. We can therefore safely adjust the range to ( 1E18 ) -1 ie 17 digits all 9s in a row.

Given that it is quite feasible to do the test every number up to about 8 or 9 digits lets look into methods for treating the larger numbers as two shorter numbers. Working an example number 7317007327 lets split the digits into two 6 digit numbers and call them M & L. We have M=73170 and L=007327.

M, L = 007317, 007327

sd(M), sd(L) = 18, 19 = 37

M*137 , L*137 = 1002429 , 1003799
Treated separately

sd(M*137 ), sd(L*137) = 18 , 29 added together are 47

which would indicate the sd(M.L) = 37 is not equal to sd(137 M.L) = 47 for this value of M & L. However we can use these numbers to test alternative handling of split numbers.

There is an apparent flaw in the above process that can be seen when M & L are recombined into a single number:

n = 007317007327

sd(n) = 37
137n = 1002430003799
sd(137n) = 38

gives 37 & 38, the property is not satisfied. The split method above is not giving the same answers which it would if the split method was correct.

We cans that splitting a big multi digit numbers into parts and treating those parts independently is not going to work without some carry over adjustments. As L*137 = 1003799 gives a 7 digit result there is carry over of 1 from L*137 into M. Should this carry be added before or after M is * 137?

Using . to represent textual concatination and M & L as the two textual halves of a number sd(M.L) == sd ( 137 * ( M.L) )

is calculated as

sd(007317) + sd(007327) = 18 + 19 = 37

then

sd(M) + sd(L) == sd( 137 * M + car( L*137 ) ) + sd( bot(L*137) )

where car( ) is the carry digits from 137*M and bot( ) are the remaining digits after carry digits have been removed.

Working this on 007317007327 we start with M as 007317 and L as 007327.

From L*137 = 1003799 we get car( L*137 ) = 1 and bot(L) is 003799

Before
= sd( 137 * ( M + car( L*137 ) )) + sd( bot(L*137) )
= sd( 137 * ( 007317 + 1)) + sd( 003799 )

= sd( 137 * 7318 ) + 28

= sd(1002566) + 28
= 20 + 28
= 48 which does not matches with sd(137 * 7317007317) = sd(1002430003799) = 38

After
= sd( 137 * M + car( L*137 ) ) + sd( bot(L*137) )
= sd( (137 * 007317) + 1)) + sd( 003799 )

= sd( 1002429 + 1 ) + 28

= sd( 1002430) + 28
= 10 + 28
= 38 which matches with sd(137 * 7317007317) = sd(1002430003799)

The property is still not satisfied as 37 not equal to 38 but at least we have consitent results when calculating sd(137 * n ) = 38 using the split number method with carry added in after multiplication.

Analysis Step 5 - Memorise and join

Having shown that we can handle big numbers by splitting in two the next task is to rework 2 * 9 digit numbers using a memorised table of results to join the 2 halves.

... TBC

Updated 10:00 - 10 April 2018

PS Perl "Feature"

Doing the exploration of the investigation above, a language features of Perl came to prominence. Many of the calculations involved taking the textural representation of a number and summing the digits for the sd() function. For sd(312023) = 11. We also split the text string and interpreted those as numbers. 312023 => 321 and 023. Perl is generally quite forgiving about moving between strings and integers but unfortunately this stumbles on a feature from the dark ages of computing.

Consider

perl -e 'print 312023," ",312," ",23,"\n";'

312023 312 23

and

perl -e 'print 312023," ",312," ",023,"\n";'

312023 312 19

In the second example the third part "023" is interpreted as octal ( base 7 ) and becomes 2*8+3 decimal 19. Similarly 0x23 would be interpreted as hexadecimal giving decimal 35. This took some hunting down.

RAND(Squawk)

Wednesday, 28 March 2018

Technical problem to solve - Function over range using sum of digits.