<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Data Analysis on Arshad Siddiqui</title><link>https://arshadhs.github.io/tags/data-analysis/</link><description>Recent content in Data Analysis on Arshad Siddiqui</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Thu, 12 Mar 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://arshadhs.github.io/tags/data-analysis/index.xml" rel="self" type="application/rss+xml"/><item><title>Statistics</title><link>https://arshadhs.github.io/docs/ai/statistics/</link><pubDate>Thu, 12 Mar 2026 00:00:00 +0000</pubDate><guid>https://arshadhs.github.io/docs/ai/statistics/</guid><description>&lt;h1 id="statistics">
 Statistics
 
 &lt;a class="anchor" href="#statistics">#&lt;/a>
 
&lt;/h1>
&lt;p>&lt;strong>Statistical methods&lt;/strong> help you turn &lt;strong>raw data into reliable conclusions&lt;/strong>, while understanding &lt;strong>uncertainty, variability, and confidence&lt;/strong>.&lt;/p>
&lt;p>Statistics provides the &lt;strong>language and tools&lt;/strong> for reasoning about data, uncertainty, and inference.&lt;/p>
&lt;p>ML needs &lt;strong>understanding data behaviour&lt;/strong>, drawing conclusions, and validating machine learning models.&lt;/p>
&lt;ul>
&lt;li>Collect Data&lt;/li>
&lt;li>Present &amp;amp; Organise Data (in a systematic manner)&lt;/li>
&lt;li>Alalyse Data&lt;/li>
&lt;li>Infer about the Data&lt;/li>
&lt;li>Take Decision from the Data&lt;/li>
&lt;/ul>
&lt;hr>




&lt;ul>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/statistics/00_formulas/">Formula Sheet&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/statistics/ism-formula-sheet/">Stats Formula Sheet&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/statistics/01_basic_statistics/">Basic Statistics&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/statistics/01_basic_probability/">Basic Probability&lt;/a>
 &lt;/li>
 
 
 
 
 
 
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/statistics/04_hypothesis_testing/">Hypothesis Testing&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/statistics/05_prediction_n_forecasting/">Prediction &amp;amp; Forecasting&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/statistics/06_prediction_n_forecasting/">Gaussian Mixture model &amp;amp; Expectation Maximization&lt;/a>
 &lt;/li>
 
 

 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/statistics/conditional-probability/">Conditional Probability &amp;amp; Bayes’ Theorem&lt;/a>

 
 



&lt;ul>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/statistics/conditional-probability/021_conditional_prob/">Conditional Probability&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/statistics/conditional-probability/022_bayes_theorem/">Bayes’ Theorem&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/statistics/conditional-probability/023_naive_bayes/">Naïve Bayes&lt;/a>
 &lt;/li>
 
 

 
 
&lt;/ul>

 &lt;/li>
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/statistics/probability_distributions/">Probability Distributions&lt;/a>

 
 



&lt;ul>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/statistics/probability_distributions/random-variables/">Random Variables&lt;/a>
 &lt;/li>
 
 
 
 
 &lt;li>
 &lt;a href="https://arshadhs.github.io/docs/ai/statistics/probability_distributions/common-distributions/">Common Probability Distributions&lt;/a>
 &lt;/li>
 
 

 
 
&lt;/ul>

 &lt;/li>
 
&lt;/ul>


&lt;hr>
&lt;table>
 &lt;thead>
 &lt;tr>
 &lt;th>Statistics Topic&lt;/th>
 &lt;th>What you learn (plain English)&lt;/th>
 &lt;th>ML Connection&lt;/th>
 &lt;/tr>
 &lt;/thead>
 &lt;tbody>
 &lt;tr>
 &lt;td>1. Basic Probability &amp;amp; Statistics&lt;/td>
 &lt;td>Summarise data;&lt;br>understand spread;&lt;br>basic probability rules&lt;/td>
 &lt;td>Data understanding (EDA), feature sanity checks,&lt;br>detecting outliers, interpreting “average behaviour”&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>2. Conditional Probability &amp;amp; Bayes&lt;/td>
 &lt;td>Update probability using new information;&lt;br>Bayes’ rule&lt;/td>
 &lt;td>Naïve Bayes, Bayesian thinking,&lt;br>posterior probabilities, probabilistic classification&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>3. Probability Distributions&lt;/td>
 &lt;td>Model randomness with distributions;&lt;br>expectation/variance/covariance&lt;/td>
 &lt;td>Likelihood models, noise assumptions (Gaussian), sampling,&lt;br>probabilistic modelling foundations&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>4. Hypothesis Testing&lt;/td>
 &lt;td>Sampling, CLT, confidence intervals,&lt;br>significance tests, ANOVA, MLE&lt;/td>
 &lt;td>A/B testing, evaluating model improvements,&lt;br>significance vs noise, parameter estimation (MLE)&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>5. Prediction &amp;amp; Forecasting&lt;/td>
 &lt;td>Correlation, regression,&lt;br>time series (AR/MA/ARIMA/SARIMA etc.)&lt;/td>
 &lt;td>Linear regression, forecasting, sequential data modelling, baseline predictive modelling&lt;/td>
 &lt;/tr>
 &lt;tr>
 &lt;td>6. GMM &amp;amp; EM&lt;/td>
 &lt;td>Mixtures of Gaussians;&lt;br>iterative estimation with EM&lt;/td>
 &lt;td>Unsupervised learning (soft clustering),&lt;br>density estimation, latent-variable models&lt;/td>
 &lt;/tr>
 &lt;/tbody>
&lt;/table>
&lt;hr>


&lt;pre class="mermaid">
flowchart TD
 A[&amp;#34;Statistical Methods&amp;lt;br/&amp;gt;AIML ZC418&amp;#34;] --&amp;gt; B[&amp;#34;1. Basic Probability and Statistics&amp;#34;]
 A --&amp;gt; C[&amp;#34;2. Conditional Probability and Bayes&amp;#34;]
 A --&amp;gt; D[&amp;#34;3. Probability Distributions&amp;#34;]
 A --&amp;gt; E[&amp;#34;4. Hypothesis Testing&amp;#34;]
 A --&amp;gt; F[&amp;#34;5. Prediction and Forecasting&amp;#34;]
 A --&amp;gt; G[&amp;#34;6. Gaussian Mixture Model and EM&amp;#34;]

 B --&amp;gt; B1[&amp;#34;Central Tendency&amp;lt;br/&amp;gt;Mean - Median - Mode&amp;#34;]
 B --&amp;gt; B2[&amp;#34;Variability&amp;lt;br/&amp;gt;Range - Variance - SD - Quartiles&amp;#34;]
 B --&amp;gt; B3[&amp;#34;Basic Probability Concepts&amp;#34;]
 B3 --&amp;gt; B31[&amp;#34;Axioms of Probability&amp;#34;]
 B3 --&amp;gt; B32[&amp;#34;Definition of Probability&amp;#34;]
 B3 --&amp;gt; B33[&amp;#34;Mutually Exclusive vs Independent&amp;#34;]

 C --&amp;gt; C1[&amp;#34;Conditional Probability&amp;#34;]
 C --&amp;gt; C2[&amp;#34;Independence (conditional)&amp;#34;]
 C --&amp;gt; C3[&amp;#34;Bayes Theorem&amp;#34;]
 C --&amp;gt; C4[&amp;#34;Naive Bayes (intro)&amp;#34;]

 D --&amp;gt; D1[&amp;#34;Random Variables&amp;lt;br/&amp;gt;Discrete and Continuous&amp;#34;]
 D --&amp;gt; D2[&amp;#34;Expectation - Variance - Covariance&amp;#34;]
 D --&amp;gt; D3[&amp;#34;Transformations of RVs&amp;#34;]
 D --&amp;gt; D4[&amp;#34;Key Distributions&amp;#34;]
 D4 --&amp;gt; D41[&amp;#34;Bernoulli&amp;#34;]
 D4 --&amp;gt; D42[&amp;#34;Binomial&amp;#34;]
 D4 --&amp;gt; D43[&amp;#34;Poisson&amp;#34;]
 D4 --&amp;gt; D44[&amp;#34;Normal (Gaussian)&amp;#34;]
 D4 --&amp;gt; D45[&amp;#34;t - Chi-square - F (intro)&amp;#34;]

 E --&amp;gt; E1[&amp;#34;Sampling&amp;lt;br/&amp;gt;Random and Stratified&amp;#34;]
 E --&amp;gt; E2[&amp;#34;Sampling Distributions&amp;lt;br/&amp;gt;CLT&amp;#34;]
 E --&amp;gt; E3[&amp;#34;Estimation&amp;lt;br/&amp;gt;Confidence Intervals&amp;#34;]
 E --&amp;gt; E4[&amp;#34;Hypothesis Tests&amp;lt;br/&amp;gt;Means and Proportions&amp;#34;]
 E --&amp;gt; E5[&amp;#34;ANOVA&amp;lt;br/&amp;gt;Single and Dual factor&amp;#34;]
 E --&amp;gt; E6[&amp;#34;Maximum Likelihood&amp;#34;]

 F --&amp;gt; F1[&amp;#34;Correlation&amp;#34;]
 F --&amp;gt; F2[&amp;#34;Regression&amp;#34;]
 F --&amp;gt; F3[&amp;#34;Time Series Basics&amp;lt;br/&amp;gt;Components&amp;#34;]
 F --&amp;gt; F4[&amp;#34;Moving Averages&amp;lt;br/&amp;gt;Simple and Weighted&amp;#34;]
 F --&amp;gt; F5[&amp;#34;Time Series Models&amp;#34;]
 F5 --&amp;gt; F51[&amp;#34;AR&amp;#34;]
 F5 --&amp;gt; F52[&amp;#34;ARMA / ARIMA&amp;#34;]
 F5 --&amp;gt; F53[&amp;#34;SARIMA / SARIMAX&amp;#34;]
 F5 --&amp;gt; F54[&amp;#34;VAR / VARMAX&amp;#34;]
 F --&amp;gt; F6[&amp;#34;Exponential Smoothing&amp;#34;]

 G --&amp;gt; G1[&amp;#34;GMM&amp;lt;br/&amp;gt;Mixture of Gaussians&amp;#34;]
 G --&amp;gt; G2[&amp;#34;EM Algorithm&amp;lt;br/&amp;gt;E-step - M-step&amp;#34;]

 B -.-&amp;gt; C
 C -.-&amp;gt; D
 D -.-&amp;gt; E
 E -.-&amp;gt; F
 F -.-&amp;gt; G
&lt;/pre>

&lt;hr>
&lt;h2 id="data---types">
 Data - Types
 
 &lt;a class="anchor" href="#data---types">#&lt;/a>
 
&lt;/h2>


&lt;pre class="mermaid">
flowchart TD
	A[(Data)] --&amp;gt; B[&amp;#34;Categorical (Qualitative)&amp;#34;]
 A --&amp;gt; C[&amp;#34;Numerical (Quantitative)&amp;#34;]

 B --&amp;gt; B1[Nominal]
 B --&amp;gt; B2[Ordinal]

 C --&amp;gt; C1[Discrete]
 C --&amp;gt; C2[Continuous]

 C2 --&amp;gt; C21[Interval]
 C2 --&amp;gt; C22[Ratio]

 %% Styling
 style A fill:#E1F5FE,stroke:#333
 style B fill:#90CAF9,stroke:#333
 style B1 fill:#90CAF9,stroke:#333
 style B2 fill:#90CAF9,stroke:#333
 style C fill:#FFF9C4,stroke:#333
 style C1 fill:#FFF9C4,stroke:#333
 style C2 fill:#FFF9C4,stroke:#333
 style C21 fill:#FFF9C4,stroke:#333
 style C22 fill:#FFF9C4,stroke:#333
&lt;/pre>

&lt;div class="book-steps ">
&lt;ol>
&lt;li>
&lt;h2 id="categorical-qualitative">
 Categorical (Qualitative)
 
 &lt;a class="anchor" href="#categorical-qualitative">#&lt;/a>
 
&lt;/h2>
&lt;p>express a qualitative attribute
e.g. hair color, eye color&lt;/p></description></item></channel></rss>