More

scapp · 2026-02-02T00:16:42 1769991402

This is the easiest of the paradoxes mentioned in this thread to explain. I want to emphasize that this proof uses the technique of "Assume P, derive contradiction, therefore not P". This kind of proof relies on knowing what the running assumptions are at the time that the contradiction is derived, so I'm going to try to make that explicit.

Here's our first assumption: suppose that there's a set X with the property that for any set Y, Y is a member of X if and only if Y doesn't contain itself as a member. In other words, suppose that the collection of sets that don't contain themselves is a set and call it X.

Here's another assumption: Suppose X contains itself. Then by the premise, X doesn't contain itself, which is contradictory. Since the innermost assumption is that X contains itself, this proves that X doesn't contain itself (under the other assumption).

But if X doesn't contain itself, then by the premise again, X is in X, which is again contradictory. Now the only remaining assumption is that X exists, and so this proves that there cannot be a set with the stated property. In other words, the collection of all sets that don't contain themselves is not a set.

scapp · on Dec 31, 2024

M.A. was the highest degree available in the UK at the time [1]. The closest equivalent to a PhD program might be the Prize Fellowship that Hardy had from 1900 to 1906, though it's certainly not one to one. Don't get the impression that Hardy was done with his education when he got his masters degree.

[1] https://www.economics.soton.ac.uk/staff/aldrich/Doc1.htm

[2] https://royalsocietypublishing.org/doi/10.1098/rsbm.1949.000...

OldGuyInTheClub · on Dec 31, 2024

Interesting. The Wikipedia page for his contemporary Littlewood lists a Doctoral Advisor.

https://en.wikipedia.org/wiki/John_Edensor_Littlewood

Edit: The Mathematics Genealogy Project lists Littlewood as an M.A. as well.

https://www.mathgenealogy.org/id.php?id=10463

scapp · on Oct 27, 2024

Nothing wrong with starting at 1 for induction, but yes, having an additive monoid is nice (still get a multiplicative monoid with N*)

scapp · on May 5, 2024

*Bill Odenkirk. Bob is his brother

scapp · on April 8, 2024

Nice! I wondered if someone would beat that TAS level anyway.

scapp · on March 25, 2024

It's easy to find the UUID associated to a nickname. For example, here's [0] the one for `accrual`

[0] https://namemc.com/profile/Accrual.1

scapp · on March 17, 2024

It's always fun to make fun of cranks. Thanks for linking that. The author really needs to find the right statement of what they call the Nested Interval Theorem. I cracked up at the complete misuse of it in the "Interval Argument for the Rationals" section

scapp · on Jan 28, 2024

Kind of disappointing that it doesn't have the victory screen. But man, my fingers still remember that combo you have to do for those 4 wide areas.

scapp · on Jan 13, 2024

2 line explanation of where e comes from here: to represent a number x in base b, you need roughly log_b(x) digits. If you weight that by the number of different digits in base b, you get b * log_b(x) = b * log(x)/log(b) = b/log(b) * log(x) (where now the log is any base you care to choose).

So assuming x > 1, this is minimized precisely when b/log(b) is minimized. The derivative of b/log(b) is (log(b) - log(e))/log(b)^2, so this is zero when b = e. (The second derivative is 1/(e log(e)) > 0, so this is a minimum).

remcob · on Jan 13, 2024

Why should we weigh by the number of different digits? I.e. is there an argument why the cost of a single digit is linearly proportional to the number of values it can hold?

To me this seems like the weak point in the argument.

scotty79 · on Jan 13, 2024

Why does the number of digits in the base should matter?

asimpletune · on Jan 13, 2024

It's called radix economy. Different radices have different efficiencies as defined by the ratio of number of digits in a radix alphabet over the number of placeholders required.

Simple example, base 1 is obviously inefficient once counting past 1. Similarly base 1 million is inefficient (until you're counting in the bazillions)

scotty79 · on Jan 13, 2024

If you can store one digit of a number base million in one physical element then it's exactly as efficient when it comes to storage of digits as binary. One element per one digit.

But it's about 20 times more efficient when it comes to storage of whole numbers because you can store number up to a million in a single physical elements while binary needs 20 elements.

asimpletune · on Jan 13, 2024

I see what you're trying to say, like I understand the intuition, but like I was saying earlier this is an already understood topic https://en.wikipedia.org/wiki/Radix_economy and that point has been addressed. Sorry, not trying to be a jerk. Just wanted to point that out in case you're interested in this subject.

In the example you gave, I'm assuming by 'physical element' you mean 'placeholder' or digit. Storing more numbers in a single placeholder seems like you're just getting efficiency for free, but that's not how information works. You have to come up with a unique symbol for one million numbers (0 - 999,999). Which you have to pay for.

base 64 is a more realistic example. With 64 characters per digit, it may seem like it's more efficient, since you require less digits to express the same number as base 10 or 2, but you still have to encode 64 unique characters, and it ends up being less efficient. That doesn't mean less efficient is worse. It just means it is more specialized and used for different things. For example, base64 gets used when you want to encode information in a small amount of space. Otherwise, for the actual storage and computation of data, lower bases are preferred, and base 64 is still obviously stored as binary.

For what it's worth, the base integer with the best radix economy is 3, followed by 2.

scotty79 · on Jan 14, 2024

Thanks. I should have googled this when you mentioned it.

I see that radix economy makes some sense if you assume that the "cost" of the "element" is not fixed but linearly proportional to the number of different states it needs to hold. It was apparently linear for the computers built with triodes but log2(n) to hold n states seems more realistic in electronics. Maybe little more for error resiliency.

gnramires · on Jan 13, 2024

I think the b factor is due to the analog representation: if the base has b values, you use on average b/2 voltage for representation. I would think the mean square value in might have been more appropriate though? (as voltage energy losses usually are V^2 / R or C V^2)

Repeating the calculation (for k bits and b values) using square values, you get E = b^2 * log_b(k) and dE/db = 2b/ln(b) - b^2 / (ln(b))^2 / b = b/ln(b) * (2 - 1/ln(b)). Setting dE/db to 0, we get 2 - 1/ln(b) = 0, ln(b) = 1/2,

b_opt = sqrt(e) = 1.6487...

scotty79 · on Jan 15, 2024

In our computers, to hold b states we are using elements which cost is not proportional to b or b^2 but log2(b).

Which base is optimal then? 2? Any?

gnramires · on Jan 16, 2024

That's true, if you use binary encoding to represent the b states. I were referring to base b representation, in which by definition we use b states to represent log2(b) bits: for example, 4 states to represent bits 00,01,10,11, which could correspond to voltage levels 0V,1V,2V,3V of a variable (analogous for other values of b).

In this case, the average voltage used would be the average of 0, 1, 2^2, 3^2, which is 0+1+...+(b-1)^2 / b ~ b^2 (proportional to b^2). I showed that in this case in theory the "optimal" base would be sqrt(e), but I can't think of simple representations of a variable with less than 2 states (you could have say 2 variables representing less than 4 states, for example the 3 pairs { 0V/1V, 1V/0V, 0V/0V }, but that seems a tad complicated![1]), so the least non-trivial integer base is just 2.

I think in reality this kind of argument doesn't apply to real computers in a straightforward way, because there are many other factors at play than what I would call a 'dynamic power consumption' associated with the square voltage. There are things like leakage currents (that consume some fixed power per transistor) and other effects that significantly complicates things, so that such simple claims of being "optimal" don't apply (so I agree with the other comment about avoiding absolute claims!). But they can inspire designs and we can see if there are any advantages in more realistic cases.

[1] Further analysis: In our example, the 3 voltage pairs have mean square voltage 1/3. If we instead use the pairs { 0V/0V, 0V/1V, 1V/0V, 1V/1V }, the mean squared voltage is 1/2. In the first case, we encode log2(3)/2 bits of information per voltage value, in the second, 1 bit per value. So the energy per bit is (energy per value)/(bits per value) = (1/3)/(log2(3)/2) = 0.420... for the first case, and 0.5 for the second case. Unfortunately, it's a loss!

As an exercise, I believe that if we delete the state 1V/1V/.../1V of a k-voltage tuple, for large enough k-tuple, we come out ahead (probably not by much), although again that's not necessarily useful in practice.

gnramires · on Jan 16, 2024

Correction: I've just realized I've drawn an incorrect conclusion, the energy per bit in the case I've shown is indeed lower than the standard 2 bit encoding, so it's actually a win! Again, for a myriad reasons, I'm not sure you'd want to use that encoding in real life, but it does consume a little under 20% less power in terms of V^2 losses.

scapp · on June 18, 2023

> Last time I checked there was still no formal version of Cantor's diagonalization argument.

Formalization? Like a computer checked proof?

Lean: https://leanprover-community.github.io/mathlib_docs/logic/fu...

Coq: https://github.com/bmsherman/finite/blob/63706fa4898aa05296c...

Isabelle: https://isabelle.in.tum.de/website-Isabelle2009-2/dist/libra...

cubefox · on June 18, 2023

> Formalization? Like a computer checked proof?

Yes, that would be an example, though not all formalizations are written in software. I was originally talking just about standard formal logic, which is just first-order predicate calculus. Proof assistants use much more powerful systems in order to gain the same expressibility as the natural language which is used in informal proofs. (Which is kind of strange: In using natural language mathematicians don't seem to be using some advanced type theory.)

So I can't actually read these proof assistant languages, as they are using some advanced type theory based language. So just a few comments.

Lean: It looks suspicious that this seems to involve just one to two lines. Are they actually proving anything here? It doesn't look at all like Cantor's diagonal argument.

Coq: This looks better, at least from the description, and that it actually looks like a proof (Coq actually has a Qed keyword!). Though they, unlike Cantor, don't talk about real numbers here, just about sequences of natural numbers. Last time I read a discussion about it, it was mentioned that this exactly was a problem for the formal proof: Every natural number sequence (with a decimal point) corresponds to some real number, but real numbers don't necessarily have just one decimal expression, e.h. 0.999...=1. So real numbers and sequences can't straightforwardly be identified.

Isabelle: That seems to be a formalization of Cantor's powerset argument, not his diagonal argument.

Overall, this highlights a major problem with formalization of existing proofs. There is no way (at least no obvious way) to "prove", that a formal proof X actually is a formalization of some informal proof Y. X could be simply a different proof, or one which is similar but uses stronger or question begging assumptions, etc. One such example is informal proofs (like Cantor's) which didn't work in any axiomatized system, and where it isn't clear how much you can assume for a formal version without being question begging.

constantcrying · on June 18, 2023

You have literally been given a fully formal proof of Cantor diagonalization.

The lean proof seens correct, I can't read lean well, but it appears to just be a formalization of the standard proof. It at least constructs the set needed for the proof.

>Proof assistants use much more powerful systems in order to gain the same expressibility as the natural language which is used in informal proofs. (Which is kind of strange: In using natural language mathematicians don't seem to be using some advanced type theory.)

Look up "Curry-Howard isomorphism".

>Overall, this highlights a major problem with formalization of existing proofs. There is no way (at least no obvious way) to "prove", that a formal proof X actually is a formalization of some informal proof Y.

And? This seems not relevant to the question at hand. Surely some textbook contains an insufficient proof, which couldn't be formalized without more steps. Why does that matter. The question still is whether "folk theorems" are real. Certainly Cantor's theorem is no such folk theorem, as you have just been given a fully formal proof.

>Though they, unlike Cantor, don't talk about real numbers here

Cantors theorem usually refers to the existence of uncountable sets, which can be proven by showing that there is no bijection between N and R, but also by showing that there is no bijection between a set and its powerset.

>Every natural number sequence (with a decimal point) corresponds to some real number, but real numbers don't necessarily have just one decimal expression, e.h. 0.999...=1. So real numbers and sequences can't straightforwardly be identified.

This is only relevant if you say that the real numbers are "infinte strings of digits", which is not a formal definition and therefore not used in mathematics.

cubefox · on June 18, 2023

> The question still is whether "folk theorems" are real. Certainly Cantor's theorem is no such folk theorem, as you have just been given a fully formal proof.

I didn't say that Cantor's theorem is a folk theorem, only that some proofs aren't even trivial to formalize. Many other things are trivial to formalize, but they still require filling in a lot of obvious steps. It often wouldn't be practically feasible to prove them with a standard first-order calculus, otherwise proof system languages wouldn't need higher order systems with advanced type theory.

> Cantors theorem usually refers to the existence of uncountable sets, which can be proven by showing that there is no bijection between N and R, but also by showing that there is no bijection between a set and its powerset.

Yes, but I was talking about his diagonal argument, not his powerset argument. That's a different proof from a different paper.

> This is only relevant if you say that the real numbers are "infinte strings of digits", which is not a formal definition and therefore not used in mathematics.

Well, I'm not saying that, but the Coq proof seems to assume it. That was the problem I was talking about.