Estadistic
Birthday Paradox in Malaysia
The birthday paradox is mathematical phenomena which describes the probability of two people sharing the same birthday. The reason this phenomena is considered a paradox, is the fact that statistically the probability of two people sharing the same birthday is higher than what most people would expect. The goal of this project is to calculate the probability of the shared birthdays using two different approaches. The first approach is the Exact Probability Formula and the second approach is Monte Carlo Simulation.
Overview
The dataset included over 100 years of daily birth records, making statistical analysis complex due to its volume and granularity.
We used R and several libraries to process and analyze the data efficiently. lubridate was used to manage and manipulate dates, while mctest, olsrr, and lmtest supported regression diagnostics and statistical testing. Data visualization and exploratory analysis were conducted using ggplot2.
Results
The number of people required to have a 50% of at least two sharing a birthday is 23, both by statistical calculation and Monte Carlo simulation.
The expected value for number of birthdays that result in birth-mates is 20, lower than the number of people required for a 50% chance because of cases where more than 2 people share a birthday.
When looking at real world data, we do not observe a uniform distribution of birthdays. Weekday births are consistently more common than weekend births.
Smoothing the real world data by week provides a uniform distribution.
When calculating probabilities of shared birthday for a non-uniform distribution, the likelihood of shared birthdays increases, but only slightly.



