20
Foundations of probability (1)
As explained before in our previous blog that probabilities is essential to explore more in the data science field So, Let's start our today's journey!
You may wonder what is probability and its role in data science field?
Probability is the foundation of many models and methods in data science, We can't really build a good model without knowing the concepts of probabilities.
Do you remember "head and tail" we were taught in the primary school? We'll travel back years ago to this basic example.
As we know we explain and also show the theoretical part in a piece of code in python.
Bernoulli trial
Possible outcomes here are binary which can be modeled as (Yes/No) or (On/Off) or (Head/Tail) or (Success/Failure) and so on.
In our case its success (Heads) or Failures (Tails).
Each outcome is called an Event.
For a fair coin flip it we have 50% chance of getting heads and 50% chance of getting tails for each event.
Let's simulate the coin flips, We'll be using the "Bernoulli" object from a python library called "Scipy.stats".
#Generate rvs for random variates using arg. p for success prob. and size for no. of coin flips.
from scipy.stats import bernoulli
bernoulli.rvs(p=0.5, size=1)
Outputs =>
array([0]) is the output for the first time which means failure or T.
array([1]) is the output if you ran it again which means success or H.
Change the size of flips
bernoulli.rvs (p=0.5, size=10)
Output => array([0, 1, 1, 0, 1, 0, 1, 0, 1, 0])
So, How many head there? Let's explore this together!
sum(bernoulli.rvs(p=0.5, size=10))
Output => 5, This means 5 heads and 5 tails.
Let's rerun it again and see what happens.
sum((bernoulli.rvs(p=0.5, size=10)))
Output => 2, This means we have 2 heads and 8 tails.
Using binomial distribution for independent Bernoulli trials
- n => No. of the coin flips
- p => Probability of success
- size => No. of draws of the same experiment
Let's simulate the coin flips, We'll be using the "Binom" object from a python library called "Scipy.stats".
#**Binomial r.v.**
from scipy.stats import binom
binom.rvs (n=10 , p = 0.5 , size = 1)
Output => array([7]), This means we have 7 heads out of 10 flips.
Let's now try 10 more times of drawing.
binom.rvs(n=10, p=0.5 , size=10)
Output => array([6, 5, 6, 6, 7, 6, 4, 6, 5, 6]), This means that "6" is the result that repeats the most for a fair coin.
Biased coin draws
binom.rvs(n=10, p=0.3 , size=10)
Output => array([2, 5, 3, 4, 2, 4, 1, 4, 2, 5]), Changed the p of getting heads to "0.3" lead to different outcomes.
Random number generator seed
- To simulate the outcome of a random experiment.
- If you run the same command with the same random seed, you will always get the same result. In Python we need to set a seed for the generator to produce similar outcomes in each experiment. Then we can check if the results are what we expected. We have two options to configure the generator: using the (random_state) parameter of the rvs function or using (np.random.seed).
from scipy.stats import binom
binom.rvs(n=10, p=0.5 , size =1, random_state=42)
Or
from scipy.stats import binom
import numpy as np
np.random.seed(42)
binom.rvs(n=10,p=0.5,size=1)
Output => array([4])
Today's blog is done but let's do a practice now!
This exercise requires the bernoulli object from the scipy.stats library to simulate the two possible outcomes from a coin flip, 1 ("heads") or 0 ("tails"), and the numpy library (loaded as np) to set the random generator seed.
We'll use the bernoulli.rvs() function to simulate coin flips using the size argument.
We will set the random seed so you can reproduce the results for the random experiment in each exercise.
From each experiment, you will get the values of each coin flip. You can add the coin flips to get the number of heads after flipping 10 coins using the sum() function.
Steps:
Import bernoulli from scipy.stats, set the seed with np.random.seed(). Simulate 1 flip, with a 35% chance of heads.
Use bernoulli.rvs() and sum() to get the number of heads after 10 coin flips with 35% chance of getting heads.
Using bernoulli.rvs() and sum(), try to get the number of heads after 5 flips with a 50% chance of getting heads.
# Import numpy
import numpy as np
# Import the bernoulli object from scipy.stats
from scipy.stats import bernoulli
# Set the random seed to reproduce the results
np.random.seed(42)
# Simulate one coin flip with 35% chance of getting heads
coin_flip = bernoulli.rvs(p=0.35, size=10)
print(coin_flip)
Output => [0 1 1 0 0 0 0 1 0 1]
#Using bernoulli.rvs() and sum(), try to get the number of heads after 5 flips with a 50% chance of getting heads.
five_coin_flips = bernoulli.rvs(p=0.5, size=5)
coin_flips_sum = sum(five_coin_flips)
print(coin_flips_sum)
Output => 2
Using binom to flip even more coins Previously, you simulated 10 coin flips with a 35% chance of getting heads using bernoulli.rvs().
This exercise loads the binom object from scipy.stats so you can use binom.rvs() to simulate 20 trials of 10 coin flips with a 35% chance of getting heads on each coin flip.
#Defining binom
# Set the random seed to reproduce the results
np.random.seed(42)
# Simulate 20 trials of 10 coin flips
draws = binom.rvs(n=10, p=0.35, size=20)
print(draws)
Output => [3 6 4 4 2 2 1 5 4 4 1 6 5 2 2 2 3 4 3 3]
I hope you got some basic knowledge and refreshed your mind with this blog and see you in the next learning journey where we will learn the probability distribution and more!
You can check the code
Resource : https://campus.datacamp.com/courses/foundations-of-probability-in-python/
20