Great discussion in the comments on “bright-line” rules in statistics.

Jameson Burt | April 12, 2015 at 5:41 am | Reply

The significance testing business has been recommended against by the National Academy of Sciences. Two months ago, the first journal, a journal of applied social psychology banned significance tests. Many years ago, a PLOS article showed that the probability a published article has an actual relationship is
PPV = 1 / [1 + alpha/(1 – beta)*R]
where alpha is often .05, 1-beta is the power, and R is the proportion of actual relationships among such tests in the field (you can get creative with “field”).
PPV is the Positive Predictive Value of published articles in the field.
While alpha is usually set by the statistician at .05, 1 – beta is bounded between 0 and 1, and for better designs 1- beta is near 1.
You would think better power would improve matters, but we’re looking at published articles, so the best 1 – beta can do is 1.
As a result, for fields with the proportion of true relationships small, journals publish almost no actual relationships (shall we say, ” barely any acceptable results”). You find this in many fields. For example, in genetics, if 30 of 30,000 genes cause a disease, then R is .001 and PPV is .02.
Yet, we imagine significance tests pulling out of just such situations relationships we wouldn’t otherwise notice. We’ve deceived ourselves. Although, a followup study in a redefined field with only previously significant relationships has a much better PPV.
This is similar to the problem of random numbers published in the back of books — random for the individual, but not random if we observe results when many people use the same random number table.

and one part novel combinatorial chemistry in which a nitro group on the melt cast explosive, trinitroazetidine (TNAZ), was replaced with an acyl bromide to render the subsequent compound, RRx-001, less impact sensitive, and more reactive with thiols.

Over time we’ve gotten much better at writing software. We now track performance in a way we never have before: physics, CPU, and memory are all measured on releases built every night.

With modern tools we’ve discovered that… holy cow!… applying well known software practices means we can have our physics performance and CPU and memory performance too! And in the few places that just isn’t possible, there are usually easy knobs we can turn to reduce the CPU requirements. And even if we have to make a small CPU sacrifice, Moore’s law helps out and takes up the slack.

Meanwhile, film rights have been acquired by Matt Damon who will turn it into a comedy action adventure starring Ben Afleck as Rule 30 Automaton.

Not all labs are like this of course, and I’m sure things vary a bit by field. Biology is a pretty communal research field compared to, say, pure mathematics. Working collaboratively isn’t for everyone, and causes all sorts of problems: who does what, who gets the “credit”, who makes decisions when there is disagreement. That isn’t to say that collaboration isn’t a good thing, but it seems like projects work best when there is someone who has ultimate responsibility for that project. In industry you will find people with a project manager role; in academia, there are no such project managers, so everyone needs to manage their own project to some extent. It’s a skill that will be required eventually if a student is going to stay on the academic track; I don’t see why they should wait until later to learn it rather than to start as a PhD student.

It means that great physicists like Bohr and Heisenberg were not completely familiar with multiplication of matrices. In 1925, even vector notation was quite new. In Einstein’s 1905 paper on special relativity , he writes out every vector equation as three equations involving components. In the 1922 edition of Millikan’s Practical Physics, the word “vector” does not appear in the index, and a force is represented as a directed line segment, with a notation like AB. (Segments differing by a displacement are considered inequivalent.)

Data Sharing.

We describe in Raw Data and Privacy why it is not possible to release our raw data without risk of abrogating the privacy of the participants in our study. We therefore follow the approach taken by Eckles et al. (22) in a similar situation, providing sufficient statistics for analysis. See Raw Data and Privacy for full details on the shared data accompanying this document.

Thomas Basbøllon January 18, 2018 at 2:42 am said:

“We live in an age of science and abundance. The care and reverence for books as such, proper to an age when no book was duplicated until someone took the pains to copy it out by hand, is obviously no longer suited to ‘the needs of society’, or to the conservation of learning. The weeder is supremely needed if the Garden of the Muses is to persist as a garden.” (Ezra Pound, ABC of Reading, 1934, p. 1)

Reply ↓

Much prefer the variety of wild field than some carefully manicured garden.

I had the opportunity to teach someone Python today. I’m a really bad teacher, I don’t know how to teach people programming, but it was interesting to watch.

def min_of_values(val1, val2):
    return min(val1, val2)

g = min_of_values(val1, val2)

f = open("...")

The first thing that confused them is ‘doesn’t f stand for function?’ They got stuck thinking that 'f = ’ should precede any “function”. Next was the term “call”. Since def defines the name that the function was called, they thought that def called a function, and whenever a function was called, it was being named.

Second, because open returns a file object, they got stuck in a rut of functional programming: kept thinking that min_of_values must return a function, that g contains “the bit of code before”, and that g only turns into a number when write() gets to it.

Butrton would hardly have been published today
Maybe we need more low
Peer review, while necessary,

it’s sort of “anti” pathological science.

Pathological science can occur when there is uncertainty in measurement. this person is well aware of the minutia of “fooling yourself” and is a brilliant experimentalist. Instead, the crank part is from the “introduction of ideas” part, and a lack of comparison to existing evidence, a lack of well-formed theoretical basis, maybe because of paywalls.

Guidelines for lab notes

  • Finish your thought now - don’t put it off
  • Be complete, be thorough
  • Record your observation, then geter done.

Sort of seems to imply the opposite of bright-line rules, and also mentions that in practice p should be quite a bit lower than 0.05

By this, we are not merely making the commonplace
observation that any particular threshold is arbitrary—for example, only a small change is required to move an estimate from a 5.1% significance level to 4.9%, thus moving it into statistical
significance. Rather, we are pointing out that even large changes
in significance levels can correspond to small, nonsignificant
changes in the underlying quantities.

It is standard in applied statistics to evaluate inferences based
on their statistical significance at the 5% level. There has been a
move in recent years toward reporting confidence intervals rather
than p values, and the centrality of hypothesis testing has been challenged, but even when using confidence intervals it is natural to check whether they include zero. Thus, the problem noted here
is not solved simply by using confidence intervals. Statistical
significance, in some form, is a way to assess the reliability
of statistical findings. However, as we have seen, comparisons
of the sort, “X is statistically significant but Y is not,” can be

<~xkcd> Bucket: shoofle explains it all is ‘The thing you have to remember about girls is that the hyperfluid bearings under the camshafts can be miscalibrated along either axis, so regular maintanence is required to keep resonance in the titanium casing from causing abrasions against the primary sprocket joists.’