Random Refutations

“(1) Parmenides-Leucippus: Leucippus takes the existence of motion as a partial refutation of Parmenides’s theory that the world is full and motionless. This leads to the theory of ‘atoms and the void’. It is the foundation of atomic theory.

(2) Galileo refutes Aristotle’s theory of motion : this leads to the foundation of the theory of acceleration, and later of Newtonian forces. Also, Galileo takes the moons of Jupiter and the phases of Venus as a refutation of Ptolemy, and thus as empirical support of the rival theory of Copernicus.

(3) Toricelli (and predecessors) : the refutation of ‘nature abhors a vacuum‘. This prepares for a mechanistic world view.

(4) Kepler’s refutation of the hypothesis of circular motion upheld till then (even by Tycho and Galileo), leads to Kepler’s laws and so to Newton’s theory.6

(5) Lavoisier’s refutation of the phlogiston theory leads to modern chemistry.

(6) The falsification of Newton’s theory of light (Young’s two- slit experiment). This leads to the Young-Fresnel theory of light. The velocity of light in moving water is another refutation. It prepares for special relativity.

(7) Oersted’s experiment is interpreted by Faraday as a refutation of the universal theory of Newtonian central forces and thus leads to the Faraday-Maxwell field theory.

(8) Atomic theory: the atomicity of the atom is refuted by the Thomson electron. This leads to the electromagnetic theory of matter, and, in time, to the rise of electronics. See Einstein’s and Weyl’s attempts at a monistic (‘unified’) theory of gravitation and electromagnetics.

(9) Michelson’s experiment (1881-1887-1902, etc.) leads to Lorentz’s Versuch einer Theorie der ele/etrischen und optischen E rscheinungen in bewegten Ko’rpern (1895: see §89). Lorentz’s book was crucially important to Einstein, who alluded to it twice in §9 of his relativity paper of 1905. (Einstein himself did not regard the Michelson experiment as very important.) Einstein’s special relativity theory is (a) a development of the formalism founded by Lorentz and (b) a different—that is, relativistic—interpretation of that formalism. There is no crucial experiment so far to decide between Lorentz’s and Einstein’s interpretations; but if we have to adopt action at a distance (non-locality: see Quantum Theory and the Schism in Physics, Vol. III of the Postscript, Preface 1982), then we would have to return to Lorentz.

Incidentally, it took years before physicists began to come to some agreement about the importance of Michelson’s experiments: I do not contend that falsifications are usually accepted at once (see the preceding section) not even that they are immediately recognised as potential falsifications.

(10) The ‘chance-discoveries’ of Roentgen and of Becquerel refuted certain (unconsciously held) expectations; especially Becquerel’s expectations. They had, of course, revolutionary consequences.

(11) Wilhelm Wien’s (partially) successful theory of black body radiation conflicted with the (partially) also very successful theories of SirJames Jeans and Lord Rayleigh. The refutation by Lummer and Pringsheim of the radiation formula of Rayleigh and Jeans, together with Wien’s work, leads to Planck’s quantum theory (see L.Sc.D., p. 108). In this, Planck refutes his own theory, the absolutistic interpretation of the entropy law, as opposed to a probabilistic interpretation similar to Boltzmann’s.

(12) Philipp Lenard’s experiments concerning the photoelectric effect conflicted, as Lenard himself insisted, with what was to be expected from Maxwell’s theory. They led to Einstein’s theory of light-quanta or photons (which were of course also in conflict with Maxwell), and thus, much later, to particle- wave dualism. (

(13) The refutation of the Mach-Ostwald anti-atomistic and phe- nomenalistic theory of matter: Einstein’s great paper on Brownian motion of 1905 suggested that Brownian motion may be interpreted as a refutation of this theory. Thus this paper did much to establish the reality of molecules and atoms. (14) Rutherford’s refutation of the vortex model of the atom.8 This leads directly to Bohr’s 1913 theory of the hydrogen atom, and thus, in the end, to quantum mechanics.

(14) Rutherford’s refutation of the vortex model of the atom.8 This leads directly to Bohr’s 1913 theory of the hydrogen atom, and thus, in the end, to quantum mechanics.

(15) Rutherford’s refutation (in 1919) of the theory that chemical elements cannot be changed artificially (though they may disintegrate spontaneously).

(16) The theory of Bohr, Kramers and Slater (see L.Sc.D., pp. 250, 243): this theory was refuted by Compton and Simon. The refutation leads almost at once to the Heisenberg-Born- Jordan quantum mechanics.

(17) Schrodinger’s interpretation of his (and de Broglie’s) theory is refuted by the statistical interpretation of matter waves (experiments of Davisson and Germer, and of George Thomson, for instance). This leads to Bom’s statistical interpretation.

(18) Anderson’s discovery of the positron (1932) refutes a lot: the theory of two elementary particles — protons and electrons — is refuted; conservation of particles is refuted; and Dirac’s own original interpretation of his predicted positive particles (he thought they were protons) is refuted. Some theoretical work of about 1930-31 is thereby corroborated.

(19) The electrical theory of matter elaborated by Einstein and Weyl, and held implicitly — and at any rate, pursued — by Einstein to the end of his life (since he interpreted the unified field theory as a theory of two fields, gravitation and electromagnetics),is refuted by the neutron and by Yukawa’s theory of nuclear forces: the Yukawa Meson. This gives rise to the theory of the nucleus.
(20) The refutation of parity conservation. (See Allan Franklin, Stud. Hist. Philos. Sci. 10, 1979, p. 201.)”
That is an interesting list of scientific refutations provided by Popper himself. Popper  was right to suggest that the new theories highlighted above were not direct results of the refutations. The refutations merely created new problem situations which stimulated imaginative and critical thought by thinking men. But this initial stage of conceiving a new theory is not susceptible for logical analysis.”The question how it happens that a new idea occurs to a man  … may be of great interest  to empirical psychology ; but it is irrelevant to the logical analysis of scientific knowledge” (See Popper, K., The  Logic of Scientific Discovery,1934,  p. 7). That is because the latter does not concern with quid facti but with quid juris.
Tagged , ,

Akaike Information Criterion Statistics

Consider a distribution {(q_1, q_2, ...,q_k)} with {q_i >0} and { q_1 + q_2 + ...+ q_k=1}. Suppose {N } independent drawings are made from the distribution and the resulting frequencies are given by { (N_1,N_2,...,N_k)}, where {N_1+N_2+...+N_k=N}. Then the probability of getting the same frequencies by sampling from {(q_1, q_2, ...,q_k)} is given by

\displaystyle W = \frac{N!}{N_1!...N_k!} q_1^{N_1} q_2^{N_2}... q_k^{N_k}

and thus

\displaystyle \ln W \approx - N \sum\limits_{i=1}^{k}\frac{N_i}{N} \ln \left( \frac{N_i}{N q_i} \right)

since {\ln N! \approx N \ln N - N}. Set {p_i = N_i/N}. Then

\displaystyle \begin{array}{rcl} \ln W &=& - N \sum\limits_{i=1}^{k} p_i \ln (p_i / q_i) \\ &=& NB(p;q) \end{array}

where {B(p;q)} is the entropy of the distribution {\{p_i \}} w.r.t. the distribution {\{q_i \}}. The entropy here can be interpreted as the logarithm of the probability of getting the distribution {\{ p_i \}} (which could asymptotically be the true distribution) by sampling from an hypothetical distribution {\{q_i\}}.

Based on Sanov’s result (1961) the above discussion may be extended to more general distributions. Let {f} and {g} be the pdfs of the true and hypothetical distributions respectively, and {F_N} the pdf estimate based on the random sampling of {N} observations from {g}. Then

\displaystyle B(f;g) = - \int f(z) \ln(f(z)/g(z)dz

as { \lim\limits_{\epsilon \downarrow 0} \lim\limits_{N \rightarrow \infty} N^{-1} P(\sup_x |f_N(x)- f(x)| < \epsilon).} Note that {- B(f;g) } equals {E_f [ \ln (f(z)/g(z))] } which is the Kullback-Leibler divergence between {f} and {g}. Note also that {B(f;g) \leq 0 }. That is because

\displaystyle \begin{array}{rcl} - \mathbb{E}_f \left[ \ln \frac{f(z)}{g(z)}\right] &=& \mathbb{E}_f \left[ \ln \frac{g(z)}{f(z)} \right] \\ &\leq& \ln \mathbb{E}_f \left[\frac{g(z)}{f(z)}\right] = \ln \int \frac{g(z)}{f(z)}f(z) dz = 0 \end{array}

Suppose that we observe a data set {\mathbf{x}} of N elements. We could predict the future observations {\mathbf{y}} whose distribution is identical to that of {\mathbf{x}} by specifying a predictive distribution { g(\mathbf{y} | \mathbf{x}) } which is a function of the given dataset { \mathbf{x}}. The “closeness” of { g(\mathbf{y} | \mathbf{x}) } to the true distribution of the future observations {f(\mathbf{y})} is measured by the entropy

\displaystyle \begin{array}{rcl} B(f(.); g(.| \mathbf{x})) &=& -\int \left( \frac{f(\mathbf{y})}{ g(\mathbf{y} | \mathbf{x})} \right) \ln \left( \frac{f(\mathbf{y})}{ g(\mathbf{y} | \mathbf{x})} \right) g(\mathbf{y} | \mathbf{x}) d \mathbf{y}\\ &=& \int f(\mathbf{y}) \ln g(\mathbf{y} | \mathbf{x}) d \mathbf{y} - \int f(\mathbf{y}) \ln f(\mathbf{y}) d (\mathbf{y}) \\ &=& \mathbb{E}_y \ln g(\mathbf{y} | \mathbf{x}) - c \end{array}

Hence the entropy is equivalent to the expected log-likelihood with respect to a future observation apart for a constant. The goodness of the estimation procedure specified by { g(\mathbf{y} | \mathbf{x}) } is measured by {\mathbb{E}_x \mathbb{E}_y \ln g(\mathbf{y} | \mathbf{x})} which is the average over the observed data of the expected log-likelihood of the model { g(\mathbf{y} | \mathbf{x}) } w.r.t. a future observation.

Suppose {\mathbf{x}} and {\mathbf{y}} are independent and that the distribution {g(.|\mathbf{x})} is specified by a fixed parameter vector {\mathbf{\theta}} (i.e.{ g(.|\mathbf{x}) = g(.|\mathbf{\theta}))}. Then {\ln g(\mathbf{x}|\mathbf{x})=\ln g(\mathbf{x}|\mathbf{\theta})} and hence the conventional ML estimation procedure is justified as

\displaystyle \mathbb{E}_x \ln g(\mathbf{x}|\mathbf{\theta}) = \mathbb{E}_x \mathbb{E}_y \ln g(\mathbf{y}|\mathbf{x})

However generally

\displaystyle \mathbb{E}_x \ln g(\mathbf{x}|\mathbf{x}) \neq \mathbb{E}_x \mathbb{E}_y \ln g(\mathbf{y}|\mathbf{x})

Continue reading

Tagged , , , , ,

R.I.P. George E.P. Box

box

Last Thurs­day (28 March 2013), George Box passed away at the age of 93. He was one of the great sta­tis­ti­cians of the last 100 years, and leaves an aston­ish­ingly diverse legacy.

When I teach fore­cast­ing to my sec­ond year com­merce stu­dents, we cover Box-​​Cox trans­for­ma­tions, Box-​​Pierce and Ljung-​​Box tests, and Box-​​Jenkins mod­el­ling, and my stu­dents won­der if it is the same Box in all cases. It is. And we don’t even go near his work on response sur­face mod­el­ling, design of exper­i­ments, qual­ity con­trol or ran­dom num­ber gen­er­a­tion. Occa­sion­ally, a stu­dent won­ders if box­plots are also due to GEP Box, but they were the brain­child of his good friend John W Tukey.

Continue reading

Tagged ,

The beauty of piecewise polynomials

It always amazes me  what beautiful objects may be created by very simple mathematical equations…piecewise

Continue reading

Tagged , ,

LAN for Linear Processes

Consider a m-vector linear process

\displaystyle \mathbf{X}(t) = \sum\limits_{j=0}^{\infty} A_{\theta}(j)\mathbf{U}(t-j), \qquad t \in \mathbb{Z}

where {\mathbf{U}(t)} are i.i.d. m-vector random variables with p.d.f. {p(\mathbf{u})>0} on {\mathbf{R}^m}, {A_{\theta} (j)} are {m \times m} matrices depending on a parameter vector { \mathbf{\theta} = (\theta_1,...,\theta_q) \in \Theta \subset \mathbf{R}^q}.

Set

\displaystyle A_{\theta}(z) = \sum\limits_{j=0}^{\infty} A_{\theta}(j)z^j, \qquad |z| \leq 1.

Assume the following conditions are satisfied

A1 i) For some {D} {(0<D<1/2)}

\displaystyle \pmb{|} A_{\theta}(j) \pmb{|} = O(j^{-1+D}), \qquad j \in \mathbb{N},

where { \pmb{|} A_{\theta}(j) \pmb{|}} denotes the sum of the absolute values of the entries of { A_{\theta}(j)}.

ii) Every { A_{\theta}(j)} is continuously two times differentiable with respect to {\theta}, and the derivatives satisfy

\displaystyle |\partial_{i_1} \partial_{i_2}... \partial_{i_k} A_{\theta, ab}(j)| = O \{j^{-1+D}(logj)^k\}, \qquad k=0,1,2

for {a,b=1,...,m,} where {\partial_i = \partial/ \partial\theta_i}.

iii) {det A_{\theta}(z) \neq 0} for {|z| \leq 1} and {A_{\theta}(z)^{-1}} can be expanded as follows:

\displaystyle A_{\theta}(z)^{-1} = I_m + B_{\theta}(1)z + B_{\theta}(2)z^2 + ...,

where { B_{\theta}(j)}, {j=1,2,...,} satisfy

\displaystyle \pmb{|} B_{\theta}(j) \pmb{|} = O(j^{-1-D}).

iv) Every { B_{\theta}(j)} is continuously two times differentiable with respect to {\theta}, and the derivatives satisfy

\displaystyle |\partial_{i_1} \partial_{i_2}... \partial_{i_k} B_{\theta, ab}(j)| = O \{j^{-1+D}(logj)^k\}, \qquad k=0,1,2

for {a,b=1,...,m.}

A2 {p(.)} satisfies

\displaystyle \lim\limits_{\| \mathbf{u} \| \rightarrow \infty} p(\mathbf{u})=0, \qquad \int \mathbf{u} p(\mathbf{u}) d \mathbf{u} =0, \qquad \text{and} \qquad \int \mathbf{uu'}p(\mathbf{u}) d \mathbf{u}=I_m

A3 The continuous derivative {Dp} of {p(.)} exists on {\mathbf{R}^m}.

A4

\displaystyle \int \pmb{|} \phi(\mathbf{u}) \pmb{|}^4 p (\mathbf{u}) d \mathbf{u} < \infty,
where {\phi(\mathbf{u}) = p^{-1}Dp}.

From A1 the linear process can be expressed as

\displaystyle \sum\limits_{j=0}^{\infty} B_{\theta}(j) \mathbf{X}(t-j) = \mathbf{U}(t), \qquad B_{\theta} (0) = I_m
and hence

\displaystyle \mathbf{U}(t) = \sum\limits_{j=0}^{t-1}B_{\theta}(j)\mathbf{X}(t-j)+\sum\limits_{r=0}^{\infty}C_{\theta}(r,t)\mathbf{U}(-r),

where

\displaystyle C_{\theta}(r,t)= \sum\limits_{r'=0}^{r}B_{\theta}(r'+t)A_{\theta}(r-r').

Continue reading

Tagged , ,

Guest Post: ROB TIBSHIRANI

Reblogged from Normal Deviate:

GUEST POST: ROB TIBSHIRANI

Today we have a guest post by my good friend Rob Tibshirani. Rob has a list of nine great statistics papers. (He is too modest to include his own papers.) Have a look and let us know what papers you would add to the list. And what machine learning papers would you add? Enjoy.

9 Great Statistics papers published after 1970…

Read more… 552 more words

I would definitely add to the list the following paper which Rob Tibshirani was too modest to include: R Tibshirani (1996) – Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. Series B (Methodological), 267-288
Tagged , ,
Follow

Get every new post delivered to your Inbox.