CPI Academy Workshops | Registration Desk
Please register online for the workshops on the corresponding workshop page.
IMPORTANT: If you cannot attend, please unregister 48 hours prior to the event - you will be blacklisted if you do not attend a workshop for which you have registered! The workshops are exclusive to CPI Academy members.
If you have any questions, please contact Dr. Claudia Koch.
The main aim of this course is to teach how to approach data analysis problems with classical statistics. We focus on the intuition behind statistical methodologies rather than on "how to run a t-test with R"
(which we will also learn, by the way).
Topics:
- Sampling theory: obtaining information about a population via sampling. Sample characteristics (location, dispersion, skewness), estimation of the mean, standard error of the mean.
- Probability theory:
-- Discrete distributions: Discrete Uniform, Bernoulli, Binomial, Poisson, Negative Binomial.
-- Continuous distributions: From the Discrete to the Continuous Uniform. Exponential and Gamma as waiting time distributions in Poisson processes.
-- Central limit theorem: The Normal distribution, chi-square distribution. Distribution of the sample mean. Standard error of the mean.
- Hypothesis testing. Confidence interval of the sample mean, "Student"'s t-distribution. One-group and two-group t-tests.
- Type I and Type II errors. P-value distributions (why the magnitude of the p-value doesn't matter). Power calculations for the t-test.
- "Cookbook of tests":
-- Distribution tests: Shapiro-Wilk, Kolmogorov-Smirnov.
-- Tests of location: Wilcoxon-Mann-Whitney test. Parametric and non-parametric tests.
-- Tests of disperson: F-test.
-- Counting statistics: chi-square test for contingency tables and goodness-of-fit. Fisher's exact test.
-- Correlations: Pearson, Kendall's tau, Spearman's rank correlation test.
Out of scope:
We are **not** going to analyse participants' data, with one exception:
they can bring a small dataset of their own with two groups and can try out a t-test and a power calculation on it. Most exercises use synthetic data for pedagogical purposes.
Pre-requisites:
- Basic familiarity with the R programming language is **required**.
- Familiarity with Jupyter notebooks and the Jupyter system (https://jupyter.org) is advantageous, but not required (the basics will be explained in advance).
Hands-on exercises:
These are short pieces of R code that the participants run during the lecture. Most of them are simulations that illustrate sampling errors, p-value variability etc. The exercises run on a server provided by the instructor.
"Homework" exercises:
These are simple tasks that the participants can complete after the lectures on their own and the system tells them whether the solutions were correct. The exercises run on yet another server provided by the instructor.
Time: 2 x 4 hours, with 2 short breaks.
Handouts:
Provided after the course. The participants get PDF copies of the lecture slides (not to be distributed) and some other "goodies" (scripts etc.)