Note : This note-book is created for placement preparation. It is created from multiple resource of from internet. The purpose of this note book is that it can help me in my journey. It is only for educational purpose.¶
Random Numbers in NumPy¶
What is a Random Number?¶
- Random number does NOT mean a different number every time. Random means something that can not be predicted logically.
Pseudo Random and True Random.¶
Computers work on programs, and programs are definitive set of instructions. So it means there must be some algorithm to generate a random number as well.
If there is a program to generate random number it can be predicted, thus it is not truly random.
Random numbers generated through a generation algorithm are called pseudo random.
Can we make truly random numbers?
Yes. In order to generate a truly random number on our computers we need to get the random data from some outside source. This outside source is generally our keystrokes, mouse movements, data on network etc.
We do not need truly random numbers, unless it is related to security (e.g. encryption keys) or the basis of application is the randomness (e.g. Digital roulette wheels).
In this tutorial we will be using pseudo random numbers.
Generate Random Number¶
- NumPy offers the random module to work with random numbers.
from numpy import random
#Generate a random integer from 0 to 100:
x = random.randint(100)
print(x)
80
Generate Random Float¶
- The random module's rand() method returns a random float between 0 and 1.
x = random.rand()
print(x)
0.36701231919628763
Generate Random Array¶
- In NumPy we work with arrays, and you can use the two methods from the above examples to make random arrays.
Integers¶
- The randint() method takes a size parameter where you can specify the shape of an array.
# Generate a 1-D array containing 5 random integers from 0 to 100:
x=random.randint(100, size=(5))
print(x)
[97 71 55 68 17]
# Generate a 2-D array with 3 rows, each row containing 5 random integers from 0 to 100:
x = random.randint(100, size=(3, 5))
print(x)
[[17 51 19 93 55] [65 45 83 73 24] [81 46 95 99 35]]
Floats¶
- The rand() method also allows you to specify the shape of the array.
# Generate a 1-D array containing 5 random floats:
x = random.rand(5)
print(x)
[0.07714759 0.90307062 0.52960598 0.97641982 0.342345 ]
# Generate a 2-D array with 3 rows, each row containing 5 random numbers:
x = random.rand(3, 5)
print(x)
[[0.02180677 0.08802386 0.97195188 0.04312768 0.61830118] [0.73969708 0.08772453 0.23188781 0.51264018 0.6823543 ] [0.23266053 0.79884547 0.2189508 0.05616024 0.24690525]]
Generate Random Number From Array¶
The choice() method allows you to generate a random value based on an array of values.
The choice() method takes an array as a parameter and randomly returns one of the values.
# Return one of the values in an array:
x = random.choice([3, 5, 7, 9])
print(x)
9
The choice() method also allows you to return an array of values.
Add a size parameter to specify the shape of the array.
# Generate a 2-D array that consists of the values in the array parameter (3, 5, 7, and 9):
x = random.choice([3, 5, 7, 9], size=(3, 5))
print(x)
[[7 9 3 9 3] [5 9 3 7 5] [5 3 7 9 3]]
Random Data Distribution¶
What is Data Distribution?¶
Data Distribution is a list of all possible values, and how often each value occurs.
Such lists are important when working with statistics and data science.
The random module offer methods that returns randomly generated data distributions.
Random Distribution¶
A random distribution is a set of random numbers that follow a certain probability density function.
Probability Density Function: A function that describes a continuous probability. i.e. probability of all values in an array.
We can generate random numbers based on defined probabilities using the choice() method of the random module.
The choice() method allows us to specify the probability for each value.
The probability is set by a number between 0 and 1, where 0 means that the value will never occur and 1 means that the value will always occur.
The sum of all probability numbers should be 1.
Even if you run the example above 100 times, the value 9 will never occur.
You can return arrays of any shape and size by specifying the shape in the size parameter.
#Generate a 1-D array containing 100 values, where each value has to be 3, 5, 7 or 9.
#The probability for the value to be 3 is set to be 0.1
#The probability for the value to be 5 is set to be 0.3
#The probability for the value to be 7 is set to be 0.6
#The probability for the value to be 9 is set to be 0
x = random.choice([3, 5, 7, 9], p=[0.1, 0.3, 0.6, 0.0], size=(100))
print(x)
[5 3 7 7 7 7 7 5 5 3 3 5 7 7 5 7 5 7 3 5 5 7 3 7 7 5 5 7 5 7 7 7 5 7 7 3 7 3 7 3 7 5 7 5 5 7 7 3 7 5 5 7 5 7 7 7 7 3 7 7 5 7 7 5 7 7 5 5 7 7 7 7 5 7 7 3 5 7 5 7 7 7 7 7 7 7 7 7 5 7 5 5 7 3 3 7 7 5 5 5]
# Same example as above, but return a 2-D array with 3 rows, each containing 5 values.
x = random.choice([3, 5, 7, 9], p=[0.1, 0.3, 0.6, 0.0], size=(3, 5))
print(x)
[[7 5 7 7 7] [7 5 7 7 7] [7 7 5 7 7]]
Random Permutations¶
Random Permutations of Elements¶
A permutation refers to an arrangement of elements. e.g. [3, 2, 1] is a permutation of [1, 2, 3] and vice-versa.
The NumPy Random module provides two methods for this: shuffle() and permutation().
Shuffling Arrays¶
Shuffle means changing arrangement of elements in-place. i.e. in the array itself.
The shuffle() method makes changes to the original array.
from numpy import random
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
random.shuffle(arr)
print(arr)
[2 5 3 4 1]
Generating Permutation of Arrays¶
Generate a random permutation of elements of following array.
The permutation() method returns a re-arranged array (and leaves the original array un-changed).
from numpy import random
import numpy as np
arr = np.array([1, 2, 3, 4, 5])
print(random.permutation(arr))
[2 5 1 4 3]
Seaborn¶
Visualize Distributions With Seaborn¶
- Seaborn is a library that uses Matplotlib underneath to plot graphs. It will be used to visualize random distributions.
Install Seaborn.¶
pip install seaborn
Requirement already satisfied: seaborn in /usr/local/lib/python3.10/dist-packages (0.12.2) Requirement already satisfied: numpy!=1.24.0,>=1.17 in /usr/local/lib/python3.10/dist-packages (from seaborn) (1.22.4) Requirement already satisfied: pandas>=0.25 in /usr/local/lib/python3.10/dist-packages (from seaborn) (1.5.3) Requirement already satisfied: matplotlib!=3.6.1,>=3.1 in /usr/local/lib/python3.10/dist-packages (from seaborn) (3.7.1) Requirement already satisfied: contourpy>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.1.0) Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (0.11.0) Requirement already satisfied: fonttools>=4.22.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (4.41.0) Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (1.4.4) Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (23.1) Requirement already satisfied: pillow>=6.2.0 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (8.4.0) Requirement already satisfied: pyparsing>=2.3.1 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (3.1.0) Requirement already satisfied: python-dateutil>=2.7 in /usr/local/lib/python3.10/dist-packages (from matplotlib!=3.6.1,>=3.1->seaborn) (2.8.2) Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas>=0.25->seaborn) (2022.7.1) Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil>=2.7->matplotlib!=3.6.1,>=3.1->seaborn) (1.16.0)
Distplots¶
- Distplot stands for distribution plot, it takes as input an array and plots a curve corresponding to the distribution of points in the array.
Import Matplotlib¶
- Import the pyplot object of the Matplotlib module in your code using the following statement:
import matplotlib.pyplot as plt
Import Seaborn¶
- Import the Seaborn module in your code using the following statement
import seaborn as sns
Plotting a Distplot¶
- Note: We will be using: sns.distplot(arr, hist=False) to visualize random distributions in this tutorial.
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot([0, 1, 2, 3, 4, 5])
plt.show()
<ipython-input-28-0965c11fb30a>:4: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot([0, 1, 2, 3, 4, 5])
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot([0, 1, 2, 3, 4, 5], hist=False)
plt.show()
<ipython-input-29-37ce4ef6d8f6>:4: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot([0, 1, 2, 3, 4, 5], hist=False)
Normal (Gaussian) Distribution¶
Normal Distribution¶
The Normal Distribution is one of the most important distributions.
It is also called the Gaussian Distribution after the German mathematician Carl Friedrich Gauss.
It fits the probability distribution of many events, eg. IQ Scores, Heartbeat etc.
Use the random.normal() method to get a Normal Data Distribution.
It has three parameters:
loc - (Mean) where the peak of the bell exists.
scale - (Standard Deviation) how flat the graph distribution should be.
size - The shape of the returned array.
# Generate a random normal distribution of size 2x3:
from numpy import random
x = random.normal(size=(2, 3))
print(x)
[[-0.44948638 0.53344573 0.44730116] [-1.26778374 0.9107062 0.19385257]]
# Generate a random normal distribution of size 2x3 with mean at 1 and standard deviation of 2:
from numpy import random
x = random.normal(loc=1, scale=2, size=(2, 3))
print(x)
[[ 0.55055381 2.1225143 5.42855619] [-1.69064281 3.28449139 0.01366245]]
Visualization of Normal Distribution¶
- Note: The curve of a Normal Distribution is also known as the Bell Curve because of the bell-shaped curve.
- *Note: The curve of a Normal Distribution is also known as the Bell Curve because of the bell-shaped curve.
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.normal(size=1000), hist=False)
plt.show()
<ipython-input-32-f01899b584f3>:4: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.normal(size=1000), hist=False)
Binomial Distribution¶
Binomial Distribution¶
Binomial Distribution is a Discrete Distribution.
It describes the outcome of binary scenarios, e.g. toss of a coin, it will either be head or tails.
It has three parameters:
n - number of trials.
p - probability of occurence of each trial (e.g. for toss of a coin 0.5 each).
size - The shape of the returned array.
- Discrete Distribution:The distribution is defined at separate set of events, e.g. a coin toss's result is discrete as it can be only head or tails whereas height of people is continuous as it can be 170, 170.1, 170.11 and so on.
# Given 10 trials for coin toss generate 10 data points:
from numpy import random
x = random.binomial(n=10, p=0.5, size=10)
print(x)
[6 6 6 4 8 5 6 5 3 6]
Visualization of Binomial Distribution¶
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.binomial(n=10, p=0.5, size=1000), hist=True, kde=False)
plt.show()
<ipython-input-34-124753da25d2>:4: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.binomial(n=10, p=0.5, size=1000), hist=True, kde=False)
Difference Between Normal and Binomial Distribution¶
- The main difference is that normal distribution is continous whereas binomial is discrete, but if there are enough data points it will be quite similar to normal distribution with certain loc and scale.
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.normal(loc=50, scale=5, size=1000), hist=False, label='normal')
sns.distplot(random.binomial(n=100, p=0.5, size=1000), hist=False, label='binomial')
plt.show()
<ipython-input-35-59bfab1f8342>:5: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.normal(loc=50, scale=5, size=1000), hist=False, label='normal') <ipython-input-35-59bfab1f8342>:6: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.binomial(n=100, p=0.5, size=1000), hist=False, label='binomial')
Poisson Distribution¶
- Poisson Distribution is a Discrete Distribution.
- It estimates how many times an event can happen in a specified time. e.g. If someone eats twice a day what is the probability he will eat thrice?
- It has two parameters:
lam - rate or known number of occurrences e.g. 2 for above problem.
size - The shape of the returned array.
#Generate a random 1x10 distribution for occurrence 2:
from numpy import random
x = random.poisson(lam=2, size=10)
print(x)
[1 1 1 2 3 1 0 1 2 1]
Visualization of Poisson Distribution¶
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.poisson(lam=2, size=1000), kde=False)
plt.show()
<ipython-input-37-c9341d8f9222>:5: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.poisson(lam=2, size=1000), kde=False)
Difference Between Normal and Poisson Distribution¶
Normal distribution is continuous whereas poisson is discrete.
But we can see that similar to binomial for a large enough poisson distribution it will become similar to normal distribution with certain std dev and mean.
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.normal(loc=50, scale=7, size=1000), hist=False, label='normal')
sns.distplot(random.poisson(lam=50, size=1000), hist=False, label='poisson')
plt.show()
<ipython-input-38-f9082ee18dc3>:5: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.normal(loc=50, scale=7, size=1000), hist=False, label='normal') <ipython-input-38-f9082ee18dc3>:6: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.poisson(lam=50, size=1000), hist=False, label='poisson')
Difference Between Binomial and Poisson Distribution¶
Binomial distribution only has two possible outcomes, whereas poisson distribution can have unlimited possible outcomes.
But for very large n and near-zero p binomial distribution is near identical to poisson distribution such that n * p is nearly equal to lam.
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.binomial(n=1000, p=0.01, size=1000), hist=False, label='binomial')
sns.distplot(random.poisson(lam=10, size=1000), hist=False, label='poisson')
plt.show()
<ipython-input-39-8a0d9974ba60>:5: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.binomial(n=1000, p=0.01, size=1000), hist=False, label='binomial') <ipython-input-39-8a0d9974ba60>:6: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.poisson(lam=10, size=1000), hist=False, label='poisson')
Uniform Distribution¶
Uniform Distribution¶
Used to describe probability where every event has equal chances of occuring.
E.g. Generation of random numbers.
It has three parameters:
a - lower bound - default 0 .0.
b - upper bound - default 1.0.
size - The shape of the returned array.
# Create a 2x3 uniform distribution sample:
from numpy import random
x = random.uniform(size=(2, 3))
print(x)
[[0.64878272 0.97861569 0.7968969 ] [0.69440919 0.80439337 0.47883707]]
Visualization of Uniform Distribution¶
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.uniform(size=1000), hist=False)
plt.show()
<ipython-input-41-6e2bb410aed7>:4: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.uniform(size=1000), hist=False)
Logistic Distribution¶
Logistic Distribution is used to describe growth.
Used extensively in machine learning in logistic regression, neural networks etc.
It has three parameters:
loc - mean, where the peak is. Default 0.
scale - standard deviation, the flatness of distribution. Default 1.
size - The shape of the returned array.
# Draw 2x3 samples from a logistic distribution with mean at 1 and stddev 2.0
from numpy import random
x = random.logistic(loc=1, scale=2, size=(2, 3))
print(x)
[[-2.46001035 0.59451779 3.06383604] [ 3.0140323 1.28357243 -1.926531 ]]
Visualization of Logistic Distribution¶
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.logistic(size=1000), hist=False)
plt.show()
<ipython-input-43-59c72fd471ca>:5: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.logistic(size=1000), hist=False)
Difference Between Logistic and Normal Distribution¶
Both distributions are near identical, but logistic distribution has more area under the tails, meaning it represents more possibility of occurrence of an event further away from mean.
For higher value of scale (standard deviation) the normal and logistic distributions are near identical apart from the peak.
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.normal(scale=2, size=1000), hist=False, label='normal')
sns.distplot(random.logistic(size=1000), hist=False, label='logistic')
plt.show()
<ipython-input-44-8aeee4207469>:5: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.normal(scale=2, size=1000), hist=False, label='normal') <ipython-input-44-8aeee4207469>:6: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.logistic(size=1000), hist=False, label='logistic')
Multinomial Distribution¶
Multinomial Distribution¶
Multinomial distribution is a generalization of binomial distribution.
It describes outcomes of multi-nomial scenarios unlike binomial where scenarios must be only one of two. e.g. Blood type of a population, dice roll outcome.
It has three parameters:
n - number of possible outcomes (e.g. 6 for dice roll).
pvals - list of probabilties of outcomes (e.g. [1/6, 1/6, 1/6, 1/6, 1/6, 1/6] for dice roll).
size - The shape of the returned array.
Note: Multinomial samples will NOT produce a single value! They will produce one value for each pval.
Note: As they are generalization of binomial distribution their visual representation and similarity of normal distribution is same as that of multiple binomial distributions.
from numpy import random
x = random.multinomial(n=6, pvals=[1/6, 1/6, 1/6, 1/6, 1/6, 1/6])
print(x)
[1 1 1 2 0 1]
Exponential Distribution¶
Exponential Distribution¶
Exponential distribution is used for describing time till next event e.g. failure/success etc.
It has two parameters:
scale - inverse of rate ( see lam in poisson distribution ) defaults to 1.0.
size - The shape of the returned array.
#Draw out a sample for exponential distribution with 2.0 scale with 2x3 size:
from numpy import random
x = random.exponential(scale=2, size=(2, 3))
print(x)
[[0.3396497 4.04403123 1.01859327] [1.36717955 4.36381683 0.09551562]]
Visualization of Exponential Distribution¶
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.exponential(size=1000), hist=False)
plt.show()
<ipython-input-47-4c01053e8c05>:5: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.exponential(size=1000), hist=False)
Relation Between Poisson and Exponential Distribution¶
- Poisson distribution deals with number of occurences of an event in a time period whereas exponential distribution deals with the time between these events.
Chi Square Distribution¶
Chi Square distribution is used as a basis to verify the hypothesis.
It has two parameters:
df - (degree of freedom).
size - The shape of the returned array.
#Draw out a sample for chi squared distribution with degree of freedom 2 with size 2x3:
from numpy import random
x = random.chisquare(df=2, size=(2, 3))
print(x)
[[1.05467905 0.53213143 1.15598561] [7.58206088 5.45159034 1.56153295]]
Visualization of Chi Square Distribution¶
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.chisquare(df=1, size=1000), hist=False)
plt.show()
<ipython-input-49-4ce8786b232d>:5: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.chisquare(df=1, size=1000), hist=False)
Rayleigh Distribution¶
Rayleigh Distribution¶
Rayleigh distribution is used in signal processing.
It has two parameters:
scale - (standard deviation) decides how flat the distribution will be default 1.0).
size - The shape of the returned array.
#Draw out a sample for rayleigh distribution with scale of 2 with size 2x3:
from numpy import random
x = random.rayleigh(scale=2, size=(2, 3))
print(x)
[[1.10915818 2.59524667 1.75385981] [2.24726365 1.6514323 2.62477122]]
Visualization of Rayleigh Distribution¶
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.rayleigh(size=1000), hist=False)
plt.show()
<ipython-input-51-cb5fd22c9437>:5: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `kdeplot` (an axes-level function for kernel density plots). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.rayleigh(size=1000), hist=False)
Similarity Between Rayleigh and Chi Square Distribution¶
- At unit stddev and 2 degrees of freedom rayleigh and chi square represent the same distributions.
Pareto Distribution¶
A distribution following Pareto's law i.e. 80-20 distribution (20% factors cause 80% outcome).
It has two parameter:
a - shape parameter.
size - The shape of the returned array.
#Draw out a sample for pareto distribution with shape of 2 with size 2x3:
from numpy import random
x = random.pareto(a=2, size=(2, 3))
print(x)
[[0.07850282 1.99234694 0.26361963] [0.94071143 0.32779476 0.25464308]]
Visualization of Pareto Distribution¶
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
sns.distplot(random.pareto(a=2, size=1000), kde=False)
plt.show()
<ipython-input-53-3d39bbfeb2a6>:5: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(random.pareto(a=2, size=1000), kde=False)
Zipf Distribution¶
Zipf distributions are used to sample data based on zipf's law.
Zipf's Law: In a collection, the nth common term is 1/n times of the most common term. E.g. the 5th most common word in English occurs nearly 1/5 times as often as the most common word.
It has two parameters:
a - distribution parameter.
size - The shape of the returned array.
# Draw out a sample for zipf distribution with distribution parameter 2 with size 2x3:
from numpy import random
x = random.zipf(a=2, size=(2, 3))
print(x)
[[ 1 1 37] [ 1 3 1]]
Visualization of Zipf Distribution¶
- Sample 1000 points but plotting only ones with value < 10 for more meaningful chart.
from numpy import random
import matplotlib.pyplot as plt
import seaborn as sns
x = random.zipf(a=2, size=1000)
sns.distplot(x[x<10], kde=False)
plt.show()
<ipython-input-55-1ff0bb3c6ed2>:6: UserWarning: `distplot` is a deprecated function and will be removed in seaborn v0.14.0. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms). For a guide to updating your code to use the new functions, please see https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751 sns.distplot(x[x<10], kde=False)
NumPy - Matplotlib¶
- Matplotlib is a plotting library for Python. It is used along with NumPy to provide an environment that is an effective open source alternative for MatLab. It can also be used with graphics toolkits like PyQt and wxPython.
from matplotlib import pyplot as plt
- Here pyplot() is the most important function in matplotlib library, which is used to plot 2D data. The following script plots the equation y = 2x + 5
import numpy as np
from matplotlib import pyplot as plt
x = np.arange(1,11)
y = 2 * x + 5
plt.title("Matplotlib demo")
plt.xlabel("x axis caption")
plt.ylabel("y axis caption")
plt.plot(x,y)
plt.show()
- An ndarray object x is created from np.arange() function as the values on the x axis. The corresponding values on the y axis are stored in another ndarray object y. These values are plotted using plot() function of pyplot submodule of matplotlib package.
- Instead of the linear graph, the values can be displayed discretely by adding a format string to the plot() function. Following formatting characters can be used.
| Character | Description |
|---|---|
| '-' | Solid line style |
| '--' | Dashed line style |
| '-.' | Dash-dot line style |
| ':' | Dotted line style |
| '.' | Point marker |
| ',' | Pixel marker |
| 'o' | Circle marker |
| 'v' | Triangle_down marker |
| '^' | Triangle_up marker |
| '<' | Triangle_left marker |
| '>' | Triangle_right marker |
| '1' | Tri_down marker |
| '2' | Tri_up marker |
| '3' | Tri_left marker |
| '4' | Tri_right marker |
| 's' | Square marker |
| 'p' | Pentagon marker |
| '*' | Star marker |
| 'h' | Hexagon1 marker |
| 'H' | Hexagon2 marker |
| '+' | Plus marker |
| 'x' | X marker |
| 'D' | Diamond marker |
| 'd' | Thin_diamond marker |
| '|' | Vline marker |
| '_' | Hline marker |
- The following color abbreviations are also defined.
| Character | Color |
|---|---|
| 'b' | Blue |
| 'g' | Green |
| 'r' | Red |
| 'c' | Cyan |
| 'm' | Magenta |
| 'y' | Yellow |
| 'k' | Black |
| 'w' | White |
- To display the circles representing points, instead of the line in the above example, use “ob” as the format string in plot() function.
import numpy as np
from matplotlib import pyplot as plt
x = np.arange(1,11)
y = 2 * x + 5
plt.title("Matplotlib demo")
plt.xlabel("x axis caption")
plt.ylabel("y axis caption")
plt.plot(x,y,"ob")
plt.show()
Sine Wave Plot¶
import numpy as np
import matplotlib.pyplot as plt
# Compute the x and y coordinates for points on a sine curve
x = np.arange(0, 3 * np.pi, 0.1)
y = np.sin(x)
plt.title("sine wave form")
# Plot the points using matplotlib
plt.plot(x, y)
plt.show()
subplot()¶
- The subplot() function allows you to plot different things in the same figure. In the following script, sine and cosine values are plotted.
import numpy as np
import matplotlib.pyplot as plt
# Compute the x and y coordinates for points on sine and cosine curves
x = np.arange(0, 3 * np.pi, 0.1)
y_sin = np.sin(x)
y_cos = np.cos(x)
# Set up a subplot grid that has height 2 and width 1,
# and set the first such subplot as active.
plt.subplot(2, 1, 1)
# Make the first plot
plt.plot(x, y_sin)
plt.title('Sine')
# Set the second subplot as active, and make the second plot.
plt.subplot(2, 1, 2)
plt.plot(x, y_cos)
plt.title('Cosine')
# Show the figure.
plt.show()
bar()¶
- The pyplot submodule provides bar() function to generate bar graphs. The following example produces the bar graph of two sets of x and y arrays.
from matplotlib import pyplot as plt
x = [5,8,10]
y = [12,16,6]
x2 = [6,9,11]
y2 = [6,15,7]
plt.bar(x, y, align = 'center')
plt.bar(x2, y2, color = 'g', align = 'center')
plt.title('Bar graph')
plt.ylabel('Y axis')
plt.xlabel('X axis')
plt.show()
NumPy - Histogram Using Matplotlib¶
- NumPy has a numpy.histogram() function that is a graphical representation of the frequency distribution of data. Rectangles of equal horizontal size corresponding to class interval called bin and variable height corresponding to frequency.
numpy.histogram()¶
- The numpy.histogram() function takes the input array and bins as two parameters. The successive elements in bin array act as the boundary of each bin.
import numpy as np
a = np.array([22,87,5,43,56,73,55,54,11,20,51,5,79,31,27])
np.histogram(a,bins = [0,20,40,60,80,100])
hist,bins = np.histogram(a,bins = [0,20,40,60,80,100])
print(hist)
print(bins)
[3 4 5 2 1] [ 0 20 40 60 80 100]
plt()¶
- Matplotlib can convert this numeric representation of histogram into a graph. The plt() function of pyplot submodule takes the array containing the data and bin array as parameters and converts into a histogram.
from matplotlib import pyplot as plt
import numpy as np
a = np.array([22,87,5,43,56,73,55,54,11,20,51,5,79,31,27])
plt.hist(a, bins = [0,20,40,60,80,100])
plt.title("histogram")
plt.show()
I/O with NumPy¶
The ndarray objects can be saved to and loaded from the disk files. The IO functions available are :
load() and save() functions handle /numPy binary files (with npy extension)
loadtxt() and savetxt() functions handle normal text files
NumPy introduces a simple file format for ndarray objects. This .npy file stores data, shape, dtype and other information required to reconstruct the ndarray in a disk file such that the array is correctly retrieved even if the file is on another machine with different architecture.
numpy.save()¶
- The numpy.save() file stores the input array in a disk file with npy extension.
import numpy as np
a = np.array([1,2,3,4,5])
np.save('outfile',a)
- To reconstruct array from outfile.npy, use load() function.
import numpy as np
b = np.load('outfile.npy')
print(b)
[1 2 3 4 5]
- The save() and load() functions accept an additional Boolean parameter allow_pickles. A pickle in Python is used to serialize and de-serialize objects before saving to or reading from a disk file.
savetxt()¶
- The storage and retrieval of array data in simple text file format is done with savetxt() and loadtxt() functions.
a = np.array([1,2,3,4,5])
np.savetxt('out.txt',a)
b = np.loadtxt('out.txt')
print(b)
[1. 2. 3. 4. 5.]
- The savetxt() and loadtxt() functions accept additional optional parameters such as header, footer, and delimiter.
