<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
  <url>
    <loc>https://whizan.xyz/books/ml/mlchapter1</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_0.webp</image:loc>
      <image:caption>The Machine Learning Landscape. 3. If you already know all the Machine Learning basics, you may want to skip directly to Chapter 2 . If you are not sure, try to answer all the questions listed at the end of the chapter before moving on</image:caption>
      <image:title>The Machine Learning Landscape</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_1.webp</image:loc>
      <image:caption>Why Use Machine Learning?. You would test your program, and repeat steps 1 and 2 until it is good enough. The traditional approach</image:caption>
      <image:title>Why Use Machine Learning?</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_2.webp</image:loc>
      <image:caption>Why Use Machine Learning?. In contrast, a spam filter based on Machine Learning techniques automatically learns which words and phrases are good predictors of spam by detecting unusually fre‐ quent patterns of words in the spam examples compared to the ham examples . Machine Learning approach</image:caption>
      <image:title>Why Use Machine Learning?</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_3.webp</image:loc>
      <image:caption>Why Use Machine Learning?. In contrast, a spam filter based on Machine Learning techniques automatically noti‐ ces that “For U” has become unusually frequent in spam flagged by users, and it starts flagging them without your intervention. Automatically adapting to change</image:caption>
      <image:title>Why Use Machine Learning?</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_4.webp</image:loc>
      <image:caption>Why Use Machine Learning?. Machine Learning can help humans learn. To summarize, Machine Learning is great for</image:caption>
      <image:title>Why Use Machine Learning?</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_5.webp</image:loc>
      <image:caption>Supervised/Unsupervised Learning. In supervised learning , the training data you feed to the algorithm includes the desired solutions, called labels. A labeled training set for supervised learning (e.g., spam classification)</image:caption>
      <image:title>Supervised/Unsupervised Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_6.webp</image:loc>
      <image:caption>Supervised/Unsupervised Learning. Chapter 1: The Machine Learning Landscape. In Machine Learning an attribute is a data type (e.g., “Mileage”), while a feature has several meanings depending on the context, but generally means an attribute plus its value (e.g., “Mileage = 15,000”).</image:caption>
      <image:title>Supervised/Unsupervised Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_7.webp</image:loc>
      <image:caption>Supervised/Unsupervised Learning. In Machine Learning an attribute is a data type (e.g., “Mileage”), while a feature has several meanings depending on the context, but generally means an attribute plus its value (e.g., “Mileage = 15,000”). Regression</image:caption>
      <image:title>Supervised/Unsupervised Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_8.webp</image:loc>
      <image:caption>Supervised/Unsupervised Learning. In unsupervised learning , as you might guess, the training data is unlabeled . The system tries to learn without a teacher. An unlabeled training set for unsupervised learning</image:caption>
      <image:title>Supervised/Unsupervised Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_9.webp</image:loc>
      <image:caption>Supervised/Unsupervised Learning. Clustering. Visualization algorithms are also good examples of unsupervised learning algorithms: you feed them a lot of complex and unlabeled data, and they output a 2D or 3D rep‐ resentation of your data that can easily be plotted .</image:caption>
      <image:title>Supervised/Unsupervised Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_10.webp</image:loc>
      <image:caption>Supervised/Unsupervised Learning. Visualization algorithms are also good examples of unsupervised learning algorithms: you feed them a lot of complex and unlabeled data, and they output a 2D or 3D rep‐ resentation of your data that can easily be plotted . Example of a t-SNE visualization highlighting semantic clusters 3</image:caption>
      <image:title>Supervised/Unsupervised Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_11.webp</image:loc>
      <image:caption>Supervised/Unsupervised Learning. A related task is dimensionality reduction , in which the goal is to simplify the data without losing too much information. It is often a good idea to try to reduce the dimension of your train‐ ing data using a dimensionality reduction algorithm before you feed it to another Machine Learning algorithm (such as a super‐ vised learning algorithm).</image:caption>
      <image:title>Supervised/Unsupervised Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_12.webp</image:loc>
      <image:caption>Supervised/Unsupervised Learning. Yet another important unsupervised task is anomaly detection —for example, detect‐ ing unusual credit card transactions to prevent fraud, catching manufacturing defects, or automatically removing outliers from a dataset before feeding it to another learn‐ ing algorithm. Anomaly detection</image:caption>
      <image:title>Supervised/Unsupervised Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_13.webp</image:loc>
      <image:caption>Supervised/Unsupervised Learning. Some photo-hosting services, such as Google Photos, are good examples of this. Semisupervised learning</image:caption>
      <image:title>Supervised/Unsupervised Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_14.webp</image:loc>
      <image:caption>Supervised/Unsupervised Learning. Reinforcement Learning. For example, many robots implement Reinforcement Learning algorithms to learn how to walk.</image:caption>
      <image:title>Supervised/Unsupervised Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_15.webp</image:loc>
      <image:caption>Batch and Online Learning. In online learning , you train the system incrementally by feeding it data instances sequentially, either individually or by small groups called mini-batches . Online learning</image:caption>
      <image:title>Batch and Online Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_16.webp</image:loc>
      <image:caption>Batch and Online Learning. Online learning algorithms can also be used to train systems on huge datasets that cannot fit in one machine’s main memory (this is called out-of-core learning). This whole process is usually done offline (i.e., not on the live sys‐ tem), so online learning can be a confusing name. Think of it as incremental learning</image:caption>
      <image:title>Batch and Online Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_17.webp</image:loc>
      <image:caption>Batch and Online Learning. This whole process is usually done offline (i.e., not on the live sys‐ tem), so online learning can be a confusing name. Think of it as incremental learning. Using online learning to handle huge datasets</image:caption>
      <image:title>Batch and Online Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_18.webp</image:loc>
      <image:caption>Instance-Based Versus Model-Based Learning. This is called instance-based learning : the system learns the examples by heart, then generalizes to new cases using a similarity measure. Instance-based learning</image:caption>
      <image:title>Instance-Based Versus Model-Based Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_19.webp</image:loc>
      <image:caption>Instance-Based Versus Model-Based Learning. Another way to generalize from a set of examples is to build a model of these exam‐ ples, then use that model to make predictions . This is called model-based learning. Model-based learning</image:caption>
      <image:title>Instance-Based Versus Model-Based Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_20.webp</image:loc>
      <image:caption>Instance-Based Versus Model-Based Learning. Do you see a trend here?. There does seem to be a trend here!</image:caption>
      <image:title>Instance-Based Versus Model-Based Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_21.webp</image:loc>
      <image:caption>Instance-Based Versus Model-Based Learning. This model has two model parameters , θ 0 and θ 1 . 5 By tweaking these parameters, you can make your model represent any linear function, as shown in. A few possible linear models</image:caption>
      <image:title>Instance-Based Versus Model-Based Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_22.webp</image:loc>
      <image:caption>Instance-Based Versus Model-Based Learning. Now the model fits the training data as closely as possible (for a linear model), as you can see in. The linear model that fits the training data best</image:caption>
      <image:title>Instance-Based Versus Model-Based Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_23.webp</image:loc>
      <image:caption>Instance-Based Versus Model-Based Learning. print (lin_reg_model.predict(X_new)) # outputs [[ 5.96242338]]. If you had used an instance-based learning algorithm instead, you would have found that Slovenia has the closest GDP per capita to that of Cyprus ($20,732), and since the OECD data tells us that Slovenians’ life satisfaction is 5.7, you would have predicted a life satisfaction of 5.7 for Cyprus.</image:caption>
      <image:title>Instance-Based Versus Model-Based Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_24.webp</image:loc>
      <image:caption>The Unreasonable Effectiveness of Data. Chapter 1: The Machine Learning Landscape. In a famous paper published in 2001, Microsoft researchers Michele Banko and Eric Brill showed that very different Machine Learning algorithms, including fairly simple ones, performed almost identically well on a complex problem of natural language d</image:caption>
      <image:title>The Unreasonable Effectiveness of Data</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_25.webp</image:loc>
      <image:caption>Nonrepresentative Training Data. For example, the set of countries we used earlier for training the linear model was not perfectly representative; a few countries were missing.shows what the data looks like when you add the missing countries. A more representative training sample</image:caption>
      <image:title>Nonrepresentative Training Data</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_26.webp</image:loc>
      <image:caption>Overfitting the Training Data. shows an example of a high-degree polynomial life satisfaction model that strongly overfits the training data. Even though it performs much better on the training data than the simple linear model, would you really trust its predictions?. Overfitting the training data</image:caption>
      <image:title>Overfitting the Training Data</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_27.webp</image:loc>
      <image:caption>Overfitting the Training Data. are you that the W-satisfaction rule generalizes to Rwanda or Zimbabwe? Obviously this pattern occurred in the training data by pure chance, but the model has no way to tell whether a pattern is real or simply the result of noise in the data. Overfitting happens when the model is too complex relative to the amount and noisiness of the training data. The possible solutions are</image:caption>
      <image:title>Overfitting the Training Data</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_28.webp</image:loc>
      <image:caption>Overfitting the Training Data. Regularization reduces the risk of overfitting. The amount of regularization to apply during learning can be controlled by a hyper‐ parameter .</image:caption>
      <image:title>Overfitting the Training Data</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter1_29.webp</image:loc>
      <image:caption>Testing and Validating. If the training error is low (i.e., your model makes few mistakes on the training set) but the generalization error is high, it means that your model is overfitting the train‐ ing data. It is common to use 80% of the data for training and hold out 20% for testing</image:caption>
      <image:title>Testing and Validating</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/ml/mlchapter10</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_0.webp</image:loc>
      <image:caption>Biological Neurons. Before we discuss artificial neurons, let’s take a quick look at a biological neuron (rep‐ resented in. Biological neuron 3</image:caption>
      <image:title>Biological Neurons</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_1.webp</image:loc>
      <image:caption>Biological Neurons. works (BNN) 4 is still the subject of active research, but some parts of the brain have been mapped, and it seems that neurons are often organized in consecutive layers, as shown in. Multiple layers in a biological neural network (human cortex) 5</image:caption>
      <image:title>Biological Neurons</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_2.webp</image:loc>
      <image:caption>Logical Computations with Neurons. Warren McCulloch and Walter Pitts proposed a very simple model of the biological neuron, which later became known as an artificial neuron : it has one or more binary (on/off) inputs and one binary output. ANNs performing simple logical computations</image:caption>
      <image:title>Logical Computations with Neurons</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_3.webp</image:loc>
      <image:caption>The Perceptron. The Perceptron is one of the simplest ANN architectures, invented in 1957 by Frank Rosenblatt. Linear threshold unit</image:caption>
      <image:title>The Perceptron</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_4.webp</image:loc>
      <image:caption>The Perceptron. Equation 10-1. Common step functions used in Perceptrons</image:caption>
      <image:title>The Perceptron</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_5.webp</image:loc>
      <image:caption>The Perceptron. Equation 10-1. Common step functions used in Perceptrons. heaviside z =</image:caption>
      <image:title>The Perceptron</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_6.webp</image:loc>
      <image:caption>The Perceptron. A Perceptron with two inputs and three outputs is represented in. This Perceptron can classify instances simultaneously into three different binary classes, which makes it a multioutput classifier. Perceptron diagram</image:caption>
      <image:title>The Perceptron</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_7.webp</image:loc>
      <image:caption>The Perceptron. Equation 10-2. Perceptron learning rule (weight update)</image:caption>
      <image:title>The Perceptron</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_8.webp</image:loc>
      <image:caption>The Perceptron. Equation 10-2. Perceptron learning rule (weight update)</image:caption>
      <image:title>The Perceptron</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_9.webp</image:loc>
      <image:caption>The Perceptron. Equation 10-2. Perceptron learning rule (weight update). wi , j next step = wi , j + η y j − yj xi</image:caption>
      <image:title>The Perceptron</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_10.webp</image:loc>
      <image:caption>The Perceptron. However, it turns out that some of the limitations of Perceptrons can be eliminated by stacking multiple Perceptrons. XOR classification problem and an MLP that solves it</image:caption>
      <image:title>The Perceptron</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_11.webp</image:loc>
      <image:caption>Multi-Layer Perceptron and Backpropagation. An MLP is composed of one (passthrough) input layer, one or more layers of LTUs, called hidden layers , and one final layer of LTUs called the output layer (see. Multi-Layer Perceptron</image:caption>
      <image:title>Multi-Layer Perceptron and Backpropagation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_12.webp</image:loc>
      <image:caption>Multi-Layer Perceptron and Backpropagation. Activation functions and their derivatives. An MLP is often used for classification, with each output corresponding to a different binary class (e.g., spam/ham, urgent/not-urgent, and so on).</image:caption>
      <image:title>Multi-Layer Perceptron and Backpropagation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_13.webp</image:loc>
      <image:caption>Multi-Layer Perceptron and Backpropagation. An MLP is often used for classification, with each output corresponding to a different binary class (e.g., spam/ham, urgent/not-urgent, and so on). A modern MLP (including ReLU and softmax) for classification</image:caption>
      <image:title>Multi-Layer Perceptron and Backpropagation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_14.webp</image:loc>
      <image:caption>Multi-Layer Perceptron and Backpropagation. From Biological to Artificial Neurons. Biological neurons seem to implement a roughly sigmoid (S- shaped) activation function, so researchers stuck to sigmoid func‐ tions for a very long time.</image:caption>
      <image:title>Multi-Layer Perceptron and Backpropagation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_15.webp</image:loc>
      <image:caption>Training an MLP with TensorFlow’s High-Level API. Under the hood, the DNNClassifier class creates all the neuron layers, based on the ReLU activation function (we can change this by setting the activation_fn hyper‐ parameter). The TF.Learn API is still quite new, so some of the names and func‐ tions used in these examples may evolve a bit by the time you read this book. However, the general ideas should not change</image:caption>
      <image:title>Training an MLP with TensorFlow’s High-Level API</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_16.webp</image:loc>
      <image:caption>Construction Phase. activation_fn=None). The tensorflow.contrib package contains many useful functions, but it is a place for experimental code that has not yet graduated to be part of the main TensorFlow API.</image:caption>
      <image:title>Construction Phase</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter10_17.webp</image:loc>
      <image:caption>Construction Phase. loss = tf.reduce_mean(xentropy, name=&quot;loss&quot;). The sparse_softmax_cross_entropy_with_logits() function is equivalent to applying the softmax activation function and then computing the cross entropy, but it is more efficient, and it prop‐ erly takes care of corner cases like logits equal to 0.</image:caption>
      <image:title>Construction Phase</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/ml/mlchapter11</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_0.webp</image:loc>
      <image:caption>Vanishing/Exploding Gradients Problems. Logistic activation function saturation</image:caption>
      <image:title>Vanishing/Exploding Gradients Problems</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_1.webp</image:loc>
      <image:caption>Xavier and He Initialization. n inputs + n outputs. When the number of input connections is roughly equal to the number of output</image:caption>
      <image:title>Xavier and He Initialization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_2.webp</image:loc>
      <image:caption>Xavier and He Initialization. n inputs + n outputs</image:caption>
      <image:title>Xavier and He Initialization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_3.webp</image:loc>
      <image:caption>Xavier and He Initialization. n inputs + n outputs. ReLU (and its variants)</image:caption>
      <image:title>Xavier and He Initialization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_4.webp</image:loc>
      <image:caption>Xavier and He Initialization. Chapter 11: Training Deep Neural Nets. fan-in and fan-out like in Xavier initialization. This is also the default for the variance_scaling_initializer() function, but you can change this by setting the argument mode=&quot;FAN_AVG&quot;</image:caption>
      <image:title>Xavier and He Initialization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_5.webp</image:loc>
      <image:caption>Nonsaturating Activation Functions. datasets it runs the risk of overfitting the training set. Leaky ReLU</image:caption>
      <image:title>Nonsaturating Activation Functions</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_6.webp</image:loc>
      <image:caption>Nonsaturating Activation Functions. Equation 11-2. ELU activation function</image:caption>
      <image:title>Nonsaturating Activation Functions</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_7.webp</image:loc>
      <image:caption>Nonsaturating Activation Functions. Equation 11-2. ELU activation function. ELU α z =</image:caption>
      <image:title>Nonsaturating Activation Functions</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_8.webp</image:loc>
      <image:caption>Nonsaturating Activation Functions. ELU α z =</image:caption>
      <image:title>Nonsaturating Activation Functions</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_9.webp</image:loc>
      <image:caption>Nonsaturating Activation Functions. ELU α z =</image:caption>
      <image:title>Nonsaturating Activation Functions</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_10.webp</image:loc>
      <image:caption>Nonsaturating Activation Functions. ELU α z =</image:caption>
      <image:title>Nonsaturating Activation Functions</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_11.webp</image:loc>
      <image:caption>Nonsaturating Activation Functions. α exp z − 1 if z &lt; 0</image:caption>
      <image:title>Nonsaturating Activation Functions</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_12.webp</image:loc>
      <image:caption>Nonsaturating Activation Functions. z i f z ≥ 0. ELU activation function</image:caption>
      <image:title>Nonsaturating Activation Functions</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_13.webp</image:loc>
      <image:caption>Nonsaturating Activation Functions. The main drawback of the ELU activation function is that it is slower to compute than the ReLU and its variants (due to the use of the exponential function), but dur‐ ing training this is compensated by the faster convergence rate. So which activation function should you use for the hidden layers of your deep neural networks?</image:caption>
      <image:title>Nonsaturating Activation Functions</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_14.webp</image:loc>
      <image:caption>Batch Normalization. i. μB = mB i ∑= 1</image:caption>
      <image:title>Batch Normalization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_15.webp</image:loc>
      <image:caption>Batch Normalization. i. m</image:caption>
      <image:title>Batch Normalization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_16.webp</image:loc>
      <image:caption>Batch Normalization. m. = mB i ∑= 1</image:caption>
      <image:title>Batch Normalization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_17.webp</image:loc>
      <image:caption>Batch Normalization. B. – μB 2</image:caption>
      <image:title>Batch Normalization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_18.webp</image:loc>
      <image:caption>Batch Normalization. i = γ i + β. μ B is the empirical mean, evaluated over the whole mini-batch B</image:caption>
      <image:title>Batch Normalization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_19.webp</image:loc>
      <image:caption>Batch Normalization. Batch Normalization does, however, add some complexity to the model (although it removes the need for normalizing the input data since the first hidden layer will take care of that, provided it is batch-normalized). You may find that training is rather slow at first while Gradient Descent is searching for the optimal scales and offsets for each layer, but it accelerates once it has found reasonably good values</image:caption>
      <image:title>Batch Normalization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_20.webp</image:loc>
      <image:caption>Batch Normalization. Let’s walk through this code.</image:caption>
      <image:title>Batch Normalization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_21.webp</image:loc>
      <image:caption>Batch Normalization. Let’s walk through this code. Next we define bn_params, which is a dictionary that defines the parameters that will be passed to the batch_norm() function, including is_training of course.</image:caption>
      <image:title>Batch Normalization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_22.webp</image:loc>
      <image:caption>Batch Normalization. Next we define bn_params, which is a dictionary that defines the parameters that will be passed to the batch_norm() function, including is_training of course. Chapter 11: Training Deep Neural Nets</image:caption>
      <image:title>Batch Normalization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_23.webp</image:loc>
      <image:caption>Reusing Pretrained Layers. For example, suppose that you have access to a DNN that was trained to classify pic‐ tures into 100 different categories, including animals, plants, vehicles, and everyday objects. Reusing pretrained layers</image:caption>
      <image:title>Reusing Pretrained Layers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_24.webp</image:loc>
      <image:caption>Reusing Pretrained Layers. Reusing pretrained layers. If the input pictures of your new task don’t have the same size as the ones used in the original task, you will have to add a prepro‐ cessing step to resize them to the size expected by the original model.</image:caption>
      <image:title>Reusing Pretrained Layers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_25.webp</image:loc>
      <image:caption>Reusing a TensorFlow Model. First we build the new model, making sure to copy the original model’s hidden layers 1 to 3. The more similar the tasks are, the more layers you want to reuse (starting with the lower layers). For very similar tasks, you can try keeping all the hidden layers and just replace the output layer</image:caption>
      <image:title>Reusing a TensorFlow Model</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_26.webp</image:loc>
      <image:caption>Unsupervised Pretraining. you have a complex task to solve, no similar model you can reuse, and little labeled training data but plenty of unlabeled training data. 9. Unsupervised pretraining</image:caption>
      <image:title>Unsupervised Pretraining</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_27.webp</image:loc>
      <image:caption>Momentum optimization. Equation 11-4. Momentum algorithm</image:caption>
      <image:title>Momentum optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_28.webp</image:loc>
      <image:caption>Momentum optimization. Equation 11-4. Momentum algorithm</image:caption>
      <image:title>Momentum optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_29.webp</image:loc>
      <image:caption>Momentum optimization. Equation 11-4. Momentum algorithm. 1 . β + η ∇ θJ θ</image:caption>
      <image:title>Momentum optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_30.webp</image:loc>
      <image:caption>Momentum optimization. 1 . β + η ∇ θJ θ. 2 . θ θ −</image:caption>
      <image:title>Momentum optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_31.webp</image:loc>
      <image:caption>Momentum optimization. past local optima. Due to the momentum, the optimizer may overshoot a bit, then come back, overshoot again, and oscillate like this many times before stabilizing at the minimum.</image:caption>
      <image:title>Momentum optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_32.webp</image:loc>
      <image:caption>Nesterov Accelerated Gradient. Equation 11-5. Nesterov Accelerated Gradient algorithm</image:caption>
      <image:title>Nesterov Accelerated Gradient</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_33.webp</image:loc>
      <image:caption>Nesterov Accelerated Gradient. Equation 11-5. Nesterov Accelerated Gradient algorithm</image:caption>
      <image:title>Nesterov Accelerated Gradient</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_34.webp</image:loc>
      <image:caption>Nesterov Accelerated Gradient. Equation 11-5. Nesterov Accelerated Gradient algorithm</image:caption>
      <image:title>Nesterov Accelerated Gradient</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_35.webp</image:loc>
      <image:caption>Nesterov Accelerated Gradient. β + η ∇ θJ θ + β</image:caption>
      <image:title>Nesterov Accelerated Gradient</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_36.webp</image:loc>
      <image:caption>Nesterov Accelerated Gradient. NAG ends up being significantly faster than regular Momentum optimization. Regular versus Nesterov Momentum optimization</image:caption>
      <image:title>Nesterov Accelerated Gradient</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_37.webp</image:loc>
      <image:caption>AdaGrad. Equation 11-6. AdaGrad algorithm</image:caption>
      <image:title>AdaGrad</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_38.webp</image:loc>
      <image:caption>AdaGrad. Equation 11-6. AdaGrad algorithm</image:caption>
      <image:title>AdaGrad</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_39.webp</image:loc>
      <image:caption>AdaGrad. Equation 11-6. AdaGrad algorithm</image:caption>
      <image:title>AdaGrad</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_40.webp</image:loc>
      <image:caption>AdaGrad</image:caption>
      <image:title>AdaGrad</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_41.webp</image:loc>
      <image:caption>AdaGrad. 1 . + ∇ θJ θ ⊗ ∇ θJ θ</image:caption>
      <image:title>AdaGrad</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_42.webp</image:loc>
      <image:caption>AdaGrad. 1 . + ∇ θJ θ ⊗ ∇ θJ θ</image:caption>
      <image:title>AdaGrad</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_43.webp</image:loc>
      <image:caption>AdaGrad. 1 . + ∇ θJ θ ⊗ ∇ θJ θ</image:caption>
      <image:title>AdaGrad</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_44.webp</image:loc>
      <image:caption>AdaGrad. 1 . + ∇ θJ θ ⊗ ∇ θJ θ. 2 . θ θ − η ∇ θJ θ ⊘</image:caption>
      <image:title>AdaGrad</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_45.webp</image:loc>
      <image:caption>AdaGrad. si +</image:caption>
      <image:title>AdaGrad</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_46.webp</image:loc>
      <image:caption>AdaGrad. si +</image:caption>
      <image:title>AdaGrad</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_47.webp</image:loc>
      <image:caption>AdaGrad. si +. θi θi − η ∂/∂ θi J θ / for all parameters θ i (simultaneously)</image:caption>
      <image:title>AdaGrad</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_48.webp</image:loc>
      <image:caption>AdaGrad. In short, this algorithm decays the learning rate, but it does so faster for steep dimen‐ sions than for dimensions with gentler slopes. AdaGrad versus Gradient Descent</image:caption>
      <image:title>AdaGrad</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_49.webp</image:loc>
      <image:caption>RMSProp. Equation 11-7. RMSProp algorithm</image:caption>
      <image:title>RMSProp</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_50.webp</image:loc>
      <image:caption>RMSProp. Equation 11-7. RMSProp algorithm</image:caption>
      <image:title>RMSProp</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_51.webp</image:loc>
      <image:caption>RMSProp. Equation 11-7. RMSProp algorithm</image:caption>
      <image:title>RMSProp</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_52.webp</image:loc>
      <image:caption>RMSProp</image:caption>
      <image:title>RMSProp</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_53.webp</image:loc>
      <image:caption>RMSProp</image:caption>
      <image:title>RMSProp</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_54.webp</image:loc>
      <image:caption>RMSProp</image:caption>
      <image:title>RMSProp</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_55.webp</image:loc>
      <image:caption>RMSProp</image:caption>
      <image:title>RMSProp</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_56.webp</image:loc>
      <image:caption>RMSProp</image:caption>
      <image:title>RMSProp</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_57.webp</image:loc>
      <image:caption>RMSProp</image:caption>
      <image:title>RMSProp</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_58.webp</image:loc>
      <image:caption>RMSProp. β + 1 − β ∇ θJ θ ⊗ ∇ θJ θ</image:caption>
      <image:title>RMSProp</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_59.webp</image:loc>
      <image:caption>Adam Optimization. Equation 11-8. Adam algorithm</image:caption>
      <image:title>Adam Optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_60.webp</image:loc>
      <image:caption>Adam Optimization. Equation 11-8. Adam algorithm</image:caption>
      <image:title>Adam Optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_61.webp</image:loc>
      <image:caption>Adam Optimization. Equation 11-8. Adam algorithm</image:caption>
      <image:title>Adam Optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_62.webp</image:loc>
      <image:caption>Adam Optimization</image:caption>
      <image:title>Adam Optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_63.webp</image:loc>
      <image:caption>Adam Optimization</image:caption>
      <image:title>Adam Optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_64.webp</image:loc>
      <image:caption>Adam Optimization</image:caption>
      <image:title>Adam Optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_65.webp</image:loc>
      <image:caption>Adam Optimization</image:caption>
      <image:title>Adam Optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_66.webp</image:loc>
      <image:caption>Adam Optimization</image:caption>
      <image:title>Adam Optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_67.webp</image:loc>
      <image:caption>Adam Optimization</image:caption>
      <image:title>Adam Optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_68.webp</image:loc>
      <image:caption>Adam Optimization</image:caption>
      <image:title>Adam Optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_69.webp</image:loc>
      <image:caption>Adam Optimization</image:caption>
      <image:title>Adam Optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_70.webp</image:loc>
      <image:caption>Adam Optimization. β 1 + 1 − β 1 ∇ θJ θ</image:caption>
      <image:title>Adam Optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_71.webp</image:loc>
      <image:caption>Adam Optimization. T. 1 − β 1</image:caption>
      <image:title>Adam Optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_72.webp</image:loc>
      <image:caption>Adam Optimization. T. 1 − β 2</image:caption>
      <image:title>Adam Optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_73.webp</image:loc>
      <image:caption>Adam Optimization. 1 − β 2. θ θ − η ⊘</image:caption>
      <image:title>Adam Optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_74.webp</image:loc>
      <image:caption>Adam Optimization. Faster Optimizers. All the optimization techniques discussed so far only rely on the first-order partial derivatives ( Jacobians ).</image:caption>
      <image:title>Adam Optimization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_75.webp</image:loc>
      <image:caption>Learning Rate Scheduling. rupt training before it has converged properly, yielding a suboptimal solution (see. Learning curves for various learning rates η</image:caption>
      <image:title>Learning Rate Scheduling</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_76.webp</image:loc>
      <image:caption>ℓ1 and ℓ2 Regularization. reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES) loss = tf.add_n([base_loss] + reg_losses, name=&quot;loss&quot;). Don’t forget to add the regularization losses to your overall loss, or else they will simply be ignored</image:caption>
      <image:title>ℓ1 and ℓ2 Regularization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_77.webp</image:loc>
      <image:caption>Dropout. Dropout regularization. It is quite surprising at first that this rather brutal technique works at all.</image:caption>
      <image:title>Dropout</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_78.webp</image:loc>
      <image:caption>Dropout. scope=&quot;outputs&quot;). You want to use the dropout() function in tensorflow.con trib.layers, not the one in tensorflow.nn. The first one turns off (no-op) when not training, which is what you want, while the sec‐ ond one does not</image:caption>
      <image:title>Dropout</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_79.webp</image:loc>
      <image:caption>Dropout. Chapter 11: Training Deep Neural Nets. Dropconnect is a variant of dropout where individual connections are dropped randomly rather than whole neurons. In general drop‐ out performs better</image:caption>
      <image:title>Dropout</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_80.webp</image:loc>
      <image:caption>Max-Norm Regularization. Dropconnect is a variant of dropout where individual connections are dropped randomly rather than whole neurons. In general drop‐ out performs better</image:caption>
      <image:title>Max-Norm Regularization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_81.webp</image:loc>
      <image:caption>Max-Norm Regularization. Dropconnect is a variant of dropout where individual connections are dropped randomly rather than whole neurons. In general drop‐ out performs better. Another regularization technique that is quite popular for neural networks is called max-norm regularization : for each neuron, it constrains the weights w of the incom‐ ing connections such that w 2 ≤ r , where r is the max-norm hyperparameter and</image:caption>
      <image:title>Max-Norm Regularization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_82.webp</image:loc>
      <image:caption>Max-Norm Regularization. Another regularization technique that is quite popular for neural networks is called max-norm regularization : for each neuron, it constrains the weights w of the incom‐ ing connections such that w 2 ≤ r , where r is the max-norm hyperparameter and</image:caption>
      <image:title>Max-Norm Regularization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_83.webp</image:loc>
      <image:caption>Max-Norm Regularization. Another regularization technique that is quite popular for neural networks is called max-norm regularization : for each neuron, it constrains the weights w of the incom‐ ing connections such that w 2 ≤ r , where r is the max-norm hyperparameter and. · 2 is the ℓ 2 norm</image:caption>
      <image:title>Max-Norm Regularization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_84.webp</image:loc>
      <image:caption>Max-Norm Regularization. · 2 is the ℓ 2 norm</image:caption>
      <image:title>Max-Norm Regularization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_85.webp</image:loc>
      <image:caption>Max-Norm Regularization. · 2 is the ℓ 2 norm</image:caption>
      <image:title>Max-Norm Regularization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_86.webp</image:loc>
      <image:caption>Max-Norm Regularization. · 2 is the ℓ 2 norm</image:caption>
      <image:title>Max-Norm Regularization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_87.webp</image:loc>
      <image:caption>Max-Norm Regularization</image:caption>
      <image:title>Max-Norm Regularization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_88.webp</image:loc>
      <image:caption>Max-Norm Regularization. We typically implement this constraint by computing w 2 after each training step and clipping w if needed ( r )</image:caption>
      <image:title>Max-Norm Regularization</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_89.webp</image:loc>
      <image:caption>Data Augmentation. For example, if your model is meant to classify pictures of mushrooms, you can slightly shift, rotate, and resize every picture in the training set by various amounts and add the resulting pictures to the training set (see. Generating new training instances from existing ones</image:caption>
      <image:title>Data Augmentation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter11_90.webp</image:loc>
      <image:caption>Data Augmentation. the API documentation for more details). This makes it easy to implement data aug‐ mentation for image datasets. Another powerful technique to train very deep neural networks is to add skip connections (a skip connection is when you add the input of a layer to the output of a higher layer).</image:caption>
      <image:title>Data Augmentation</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/ml/mlchapter12</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_0.webp</image:loc>
      <image:caption>Distributing TensorFlow Across. In this chapter we will see how to use TensorFlow to distribute computations across multiple devices (CPUs and GPUs) and run them in parallel (see. Executing a TensorFlow graph across multiple devices in parallel</image:caption>
      <image:title>Distributing TensorFlow Across</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_1.webp</image:loc>
      <image:caption>Installation. Chapter 12: Distributing TensorFlow Across Devices and Servers. If you don’t own any GPU cards, you can use a hosting service with GPU capability such as Amazon AWS.</image:caption>
      <image:title>Installation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_2.webp</image:loc>
      <image:caption>Installation. Nvidia’s Compute Unified Device Architecture library (CUDA) allows developers to use CUDA-enabled GPUs for all sorts of computations (not just graphics accelera‐ tion). TensorFlow uses CUDA and cuDNN to control GPUs and boost DNNs</image:caption>
      <image:title>Installation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_3.webp</image:loc>
      <image:caption>Managing the GPU RAM. program #2 will only see GPU cards 2 and 3 (numbered 1 and 0, respectively). Every‐ thing will work fine (see. Each program gets two GPUs for itself</image:caption>
      <image:title>Managing the GPU RAM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_4.webp</image:loc>
      <image:caption>Managing the GPU RAM. Each program gets all four GPUs, but with only 40% of the RAM each. If you run the nvidia-smi command while both programs are running, you should see that each process holds roughly 40% of the total RAM of each card</image:caption>
      <image:title>Managing the GPU RAM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_5.webp</image:loc>
      <image:caption>Placing Operations on Devices. c = a * b. The &quot;/cpu:0&quot; device aggregates all CPUs on a multi-CPU system. There is currently no way to pin nodes on specific CPUs or to use just a subset of all CPUs</image:caption>
      <image:title>Placing Operations on Devices</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_6.webp</image:loc>
      <image:caption>Parallel Execution. TensorFlow manages a thread pool on each device to parallelize operations (see. Parallelized execution of a TensorFlow graph</image:caption>
      <image:title>Parallel Execution</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_7.webp</image:loc>
      <image:caption>Parallel Execution. As soon as operation C finishes, the dependency counters of operations D and E will be decremented and will both reach 0, so both operations will be sent to the inter-op thread pool to be executed. You can control the number of threads per inter-op pool by setting the inter_op_parallelism_threads option.</image:caption>
      <image:title>Parallel Execution</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_8.webp</image:loc>
      <image:caption>Multiple Devices Across Multiple Servers. computations (such a job is usually named &quot;worker&quot;). TensorFlow cluster</image:caption>
      <image:title>Multiple Devices Across Multiple Servers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_9.webp</image:loc>
      <image:caption>The Master and Worker Services. nology. This is a lightweight binary data interchange format. All servers in a TensorFlow cluster may communicate with any other server in the cluster, so make sure to open the appropriate ports on your firewall</image:caption>
      <image:title>The Master and Worker Services</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_10.webp</image:loc>
      <image:caption>Sharding Variables Across Multiple Parameter Servers. p2 = 3 * s # pinned to /job:worker/task:1/gpu:1. This example assumes that the parameter servers are CPU-only, which is typically the case since they only need to store and com‐ municate parameters, not perform intensive computations</image:caption>
      <image:title>Sharding Variables Across Multiple Parameter Servers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_11.webp</image:loc>
      <image:caption>Sharding Variables Across Multiple Parameter Servers. Resource containers make it easy to share variables across sessions in flexible ways. Resource containers</image:caption>
      <image:title>Sharding Variables Across Multiple Parameter Servers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_12.webp</image:loc>
      <image:caption>Asynchronous Communication Using TensorFlow Queues. every step. Using queues to load the training data asynchronously</image:caption>
      <image:title>Asynchronous Communication Using TensorFlow Queues</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_13.webp</image:loc>
      <image:caption>Asynchronous Communication Using TensorFlow Queues. q = tf.FIFOQueue(capacity=10, dtypes=[tf.float32], shapes=[[2]], name=&quot;q&quot;, shared_name=&quot;shared_q&quot;). To share variables across sessions, all you had to do was to specify the same name and container on both ends.</image:caption>
      <image:title>Asynchronous Communication Using TensorFlow Queues</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_14.webp</image:loc>
      <image:caption>Asynchronous Communication Using TensorFlow Queues. print (b_val) # [[1., 2.], [3., 4.], [5., 6.]]. If you run dequeue_a on its own, it will dequeue a pair and return only the first element; the second element will be lost (and simi‐ larly, if you run dequeue_b on its own, the first element will be lost)</image:caption>
      <image:title>Asynchronous Communication Using TensorFlow Queues</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_15.webp</image:loc>
      <image:caption>Loading Data Directly from the Graph. You must set trainable=False so the optimizers don’t try to tweak this variable. This example assumes that all of your training set (including the labels) consists only of float32 values. If that’s not the case, you will need one variable per type</image:caption>
      <image:title>Loading Data Directly from the Graph</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_16.webp</image:loc>
      <image:caption>Loading Data Directly from the Graph. A graph dedicated to reading training instances from CSV files. In the training graph, you need to create the shared instance queue and simply dequeue mini-batches from it</image:caption>
      <image:title>Loading Data Directly from the Graph</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_17.webp</image:loc>
      <image:caption>Loading Data Directly from the Graph. In this example, the first mini-batch will contain the first two instances of the CSV file, and the second mini-batch will contain the last instance. TensorFlow queues don’t handle sparse tensors well, so if your training instances are sparse you should parse the records after the instance queue</image:caption>
      <image:title>Loading Data Directly from the Graph</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_18.webp</image:loc>
      <image:caption>Loading Data Directly from the Graph. Reading simultaneously from multiple files. For this we need to write a small function to create a reader and the nodes that will read and push one instance to the instance queue</image:caption>
      <image:title>Loading Data Directly from the Graph</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_19.webp</image:loc>
      <image:caption>One Neural Network per Device. By running several client sessions in parallel (in different threads or different pro‐ cesses), connecting them to different servers, and configuring them to use different devices, you can quite easily train or run many neural networks in parallel, across all devices and all machines in your cluster (see. Training one neural network per device</image:caption>
      <image:title>One Neural Network per Device</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_20.webp</image:loc>
      <image:caption>One Neural Network per Device. It also works perfectly if you host a web service that receives a large number of queries per second (QPS) and you need your neural network to make a prediction for each query. Another option is to serve your neural networks using TensorFlow Serving .</image:caption>
      <image:title>One Neural Network per Device</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_21.webp</image:loc>
      <image:caption>In-Graph Versus Between-Graph Replication. In-graph replication. Alternatively, you can create one separate graph for each neural network and handle synchronization between these graphs yourself.</image:caption>
      <image:title>In-Graph Versus Between-Graph Replication</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_22.webp</image:loc>
      <image:caption>In-Graph Versus Between-Graph Replication. Alternatively, you can create one separate graph for each neural network and handle synchronization between these graphs yourself. Between-graph replication</image:caption>
      <image:title>In-Graph Versus Between-Graph Replication</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_23.webp</image:loc>
      <image:caption>Model Parallelism. sented by the dashed arrows). This is likely to completely cancel out the benefit of the parallel computation, since cross-device communication is slow (especially if it is across separate machines). Splitting a fully connected neural network</image:caption>
      <image:title>Model Parallelism</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_24.webp</image:loc>
      <image:caption>Model Parallelism. However, as we will see in Chapter 13 , some neural network architectures, such as convolutional neural networks, contain layers that are only partially connected to the lower layers, so it is much easier to distribute chunks across devices in an eff. Splitting a partially connected neural network</image:caption>
      <image:title>Model Parallelism</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_25.webp</image:loc>
      <image:caption>Model Parallelism. Splitting a deep recurrent neural network. In short, model parallelism can speed up running or training some types of neural networks, but not all, and it requires special care and tuning, such as making sure that devices that need to communicate the most run on the same machine</image:caption>
      <image:title>Model Parallelism</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_26.webp</image:loc>
      <image:caption>Data Parallelism. Another way to parallelize the training of a neural network is to replicate it on each device, run a training step simultaneously on all replicas using a different mini-batch for each, and then aggregate the gradients to update the model parameters. Data parallelism</image:caption>
      <image:title>Data Parallelism</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_27.webp</image:loc>
      <image:caption>Data Parallelism. With synchronous updates , the aggregator waits for all gradients to be available before computing the average and applying the result (i.e., using the aggregated gradients to update the model parameters). To reduce the waiting time at each step, you could ignore the gradi‐ ents from the slowest few replicas (typically ~10%).</image:caption>
      <image:title>Data Parallelism</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_28.webp</image:loc>
      <image:caption>Data Parallelism. rithm diverge. Stale gradients when using asynchronous updates</image:caption>
      <image:title>Data Parallelism</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_29.webp</image:loc>
      <image:caption>Data Parallelism. time spent moving the data in and out of GPU RAM (and possibly across the net‐ work) will outweigh the speedup obtained by splitting the computation load. At that point, adding more GPUs will just increase saturation and slow down training. For some models, typically relatively small and trained on a very large training set, you are often better off training the model on a single machine with a single GPU</image:caption>
      <image:title>Data Parallelism</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter12_30.webp</image:loc>
      <image:caption>Data Parallelism. Chapter 12: Distributing TensorFlow Across Devices and Servers. Although 16-bit precision is the minimum for training neural net‐ work, you can actually drop down to 8-bit precision after training to reduce the size of the model and speed up computations.</image:caption>
      <image:title>Data Parallelism</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/ml/mlchapter13</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_0.webp</image:loc>
      <image:caption>The Architecture of the Visual Cortex. David H. Local receptive fields in the visual cortex</image:caption>
      <image:title>The Architecture of the Visual Cortex</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_1.webp</image:loc>
      <image:caption>The Architecture of the Visual Cortex. and Patrick Haffner, which introduced the famous LeNet-5 architecture, widely used to recognize handwritten check numbers. Why not simply use a regular deep neural network with fully con‐ nected layers for image recognition tasks?</image:caption>
      <image:title>The Architecture of the Visual Cortex</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_2.webp</image:loc>
      <image:caption>Convolutional Layer. CNN layers with rectangular local receptive fields</image:caption>
      <image:title>Convolutional Layer</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_3.webp</image:loc>
      <image:caption>Convolutional Layer. CNN layers with rectangular local receptive fields. Until now, all multilayer neural networks we looked at had layers composed of a long line of neurons, and we had to flatten input images to 1D before feeding them to the neural network.</image:caption>
      <image:title>Convolutional Layer</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_4.webp</image:loc>
      <image:caption>Convolutional Layer. A neuron located in row i , column j of a given layer is connected to the outputs of the neurons in the previous layer located in rows i to i + f h – 1, columns j to j + f w – 1, where f h and f w are the height and width of the receptive field (see. Connections between layers and zero padding</image:caption>
      <image:title>Convolutional Layer</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_5.webp</image:loc>
      <image:caption>Convolutional Layer. It is also possible to connect a large input layer to a much smaller layer by spacing out the receptive fields, as shown in. Reducing dimensionality using a stride</image:caption>
      <image:title>Convolutional Layer</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_6.webp</image:loc>
      <image:caption>Filters. filter. During training, a CNN finds the most useful filters for its task, and it learns to combine them into more complex patterns (e.g., a cross is an area in an image where both the vertical filter and the horizontal filter are active). Applying two different filters to get two feature maps</image:caption>
      <image:title>Filters</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_7.webp</image:loc>
      <image:caption>Stacking Multiple Feature Maps. Up to now, for simplicity, we have represented each convolutional layer as a thin 2D layer, but in reality it is composed of several feature maps of equal sizes, so it is more accurately represented in 3D (see. The fact that all neurons in a feature map share the same parame‐ ters dramatically reduces the number of parameters in the model, but most importantly it means that once the CNN has learned to recognize a pattern in one location, it can recognize it in any other location.</image:caption>
      <image:title>Stacking Multiple Feature Maps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_8.webp</image:loc>
      <image:caption>Stacking Multiple Feature Maps. nel . There are typically three: red, green, and blue (RGB). Grayscale images have just one channel, but some images may have much more—for example, satellite images that capture extra light frequencies (such as infrared). Convolution layers with multiple feature maps, and images with three channels</image:caption>
      <image:title>Stacking Multiple Feature Maps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_9.webp</image:loc>
      <image:caption>TensorFlow Implementation. Padding options—input width: 13, filter width: 6, stride: 5. Unfortunately, convolutional layers have quite a few hyperparameters: you must choose the number of filters, their height and width, the strides, and the padding type.</image:caption>
      <image:title>TensorFlow Implementation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_10.webp</image:loc>
      <image:caption>Memory Requirements. During inference (i.e., when making a prediction for a new instance) the RAM occu‐ pied by one layer can be released as soon as the next layer has been computed, so you only need as much RAM as required by two consecutive layers. If training crashes because of an out-of-memory error, you can try reducing the mini-batch size.</image:caption>
      <image:title>Memory Requirements</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_11.webp</image:loc>
      <image:caption>Pooling Layer. Max pooling layer (2 × 2 pooling kernel, stride 2, no padding). This is obviously a very destructive kind of layer: even with a tiny 2 × 2 kernel and a stride of 2, the output will be two times smaller in both directions (so its area will be four times smaller), simply dropping 75% of the input values</image:caption>
      <image:title>Pooling Layer</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_12.webp</image:loc>
      <image:caption>CNN Architectures. Typical CNN architectures stack a few convolutional layers (each one generally fol‐ lowed by a ReLU layer), then a pooling layer, then another few convolutional layers (+ReLU), then another pooling layer, and so on. Typical CNN architecture</image:caption>
      <image:title>CNN Architectures</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_13.webp</image:loc>
      <image:caption>CNN Architectures. Typical CNN architecture. A common mistake is to use convolution kernels that are too large. You can often get the same effect as a 9 × 9 kernel by stacking two 3</image:caption>
      <image:title>CNN Architectures</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_14.webp</image:loc>
      <image:caption>AlexNet. 2</image:caption>
      <image:title>AlexNet</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_15.webp</image:loc>
      <image:caption>AlexNet. 2. j high = min i + r , f n − 1</image:caption>
      <image:title>AlexNet</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_16.webp</image:loc>
      <image:caption>GoogLeNet. shows the architecture of an inception module. Inception module</image:caption>
      <image:title>GoogLeNet</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_17.webp</image:loc>
      <image:caption>GoogLeNet. In short, you can think of the whole inception module as a convolutional layer on steroids, able to output feature maps that capture complex patterns at various scales. The number of convolutional kernels for each convolutional layer is a hyperparameter. Unfortunately, this means that you have six more hyperparameters to tweak for every inception layer you add</image:caption>
      <image:title>GoogLeNet</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_18.webp</image:loc>
      <image:caption>GoogLeNet. GoogLeNet architecture. Let’s go through this network</image:caption>
      <image:title>GoogLeNet</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_19.webp</image:loc>
      <image:caption>ResNet. Residual learning. When you initialize a regular neural network, its weights are close to zero, so the net‐ work just outputs values close to zero.</image:caption>
      <image:title>ResNet</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_20.webp</image:loc>
      <image:caption>ResNet. Moreover, if you add many skip connections, the network can start making progress even if several layers have not started learning yet (see. Regular deep neural network (left) and deep residual network (right)</image:caption>
      <image:title>ResNet</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_21.webp</image:loc>
      <image:caption>ResNet. Now let’s look at ResNet’s architecture (see. ResNet architecture</image:caption>
      <image:title>ResNet</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_22.webp</image:loc>
      <image:caption>ResNet. Note that the number of feature maps is doubled every few residual units, at the same time as their height and width are halved (using a convolutional layer with stride 2). Skip connection when changing feature map size and depth</image:caption>
      <image:title>ResNet</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter13_23.webp</image:loc>
      <image:caption>ResNet. There are a few other architectures that you may want to look at, in particular VGGNet 13 (runner-up of the ILSVRC 2014 challenge) and Inception-v4 14 (which merges the ideas of GoogLeNet and ResNet and achieves close to 3% top-5 error rate on ImageN. There is really nothing special about implementing the various CNN architectures we just discussed.</image:caption>
      <image:title>ResNet</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/ml/mlchapter14</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_0.webp</image:loc>
      <image:caption>Recurrent Neurons. Up to now we have mostly looked at feedforward neural networks, where the activa‐ tions flow only in one direction, from the input layer to the output layer (except for a few networks in Appendix E ). A recurrent neuron (left), unrolled through time (right)</image:caption>
      <image:title>Recurrent Neurons</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_1.webp</image:loc>
      <image:caption>Recurrent Neurons. You can easily create a layer of recurrent neurons. A layer of recurrent neurons (left), unrolled through time (right)</image:caption>
      <image:title>Recurrent Neurons</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_2.webp</image:loc>
      <image:caption>Recurrent Neurons. Equation 14-1. Output of a single recurrent neuron for a single instance</image:caption>
      <image:title>Recurrent Neurons</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_3.webp</image:loc>
      <image:caption>Recurrent Neurons. Equation 14-1. Output of a single recurrent neuron for a single instance</image:caption>
      <image:title>Recurrent Neurons</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_4.webp</image:loc>
      <image:caption>Recurrent Neurons. Equation 14-1. Output of a single recurrent neuron for a single instance</image:caption>
      <image:title>Recurrent Neurons</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_5.webp</image:loc>
      <image:caption>Recurrent Neurons. t = ϕ t T · x + t − 1 T · y + b</image:caption>
      <image:title>Recurrent Neurons</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_6.webp</image:loc>
      <image:caption>Recurrent Neurons. Equation 14-2. Outputs of a layer of recurrent neurons for all instances in a mini- batch</image:caption>
      <image:title>Recurrent Neurons</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_7.webp</image:loc>
      <image:caption>Recurrent Neurons. Equation 14-2. Outputs of a layer of recurrent neurons for all instances in a mini- batch</image:caption>
      <image:title>Recurrent Neurons</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_8.webp</image:loc>
      <image:caption>Recurrent Neurons. Equation 14-2. Outputs of a layer of recurrent neurons for all instances in a mini- batch</image:caption>
      <image:title>Recurrent Neurons</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_9.webp</image:loc>
      <image:caption>Recurrent Neurons. t = ϕ t · x + t − 1 · y +</image:caption>
      <image:title>Recurrent Neurons</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_10.webp</image:loc>
      <image:caption>Recurrent Neurons. t = ϕ t · x + t − 1 · y +. = ϕ t</image:caption>
      <image:title>Recurrent Neurons</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_11.webp</image:loc>
      <image:caption>Recurrent Neurons. = ϕ t. t − 1</image:caption>
      <image:title>Recurrent Neurons</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_12.webp</image:loc>
      <image:caption>Memory Cells. In general a cell’s state at time step t , denoted h ( t ) (the “h” stands for “hidden”), is a function of some inputs at that time step and its state at the previous time step: h ( t ) = f ( h ( t –1) , x ( t ) ). A cell’s hidden state and its output may be different</image:caption>
      <image:title>Memory Cells</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_13.webp</image:loc>
      <image:caption>Input and Output Sequences. Lastly, you could have a sequence-to-vector network, called an encoder , followed by a vector-to-sequence network, called a decoder (see the bottom-right network). Seq to seq (top left), seq to vector (top right), vector to seq (bottom left), delayed seq to seq (bottom right)</image:caption>
      <image:title>Input and Output Sequences</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_14.webp</image:loc>
      <image:caption>Static Unrolling Through Time. basic_cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons) outputs, states = tf.nn.dynamic_rnn(basic_cell, X, dtype=tf.float32). During backpropagation, the while_loop() operation does the appropriate magic: it stores the tensor values for each iteration dur‐ ing the forward pass so it can use them to compute gradients dur‐ ing the reverse pass</image:caption>
      <image:title>Static Unrolling Through Time</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_15.webp</image:loc>
      <image:caption>Handling Variable-Length Output Sequences. To train an RNN, the trick is to unroll it through time (like we just did) and then simply use regular backpropagation (see. This strategy is called backpro‐ pagation through time (BPTT). Backpropagation through time</image:caption>
      <image:title>Handling Variable-Length Output Sequences</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_16.webp</image:loc>
      <image:caption>Handling Variable-Length Output Sequences. Just like in regular backpropagation, there is a first forward pass through the unrolled network (represented by the dashed arrows); then the output sequence is evaluated</image:caption>
      <image:title>Handling Variable-Length Output Sequences</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_17.webp</image:loc>
      <image:caption>Handling Variable-Length Output Sequences. Just like in regular backpropagation, there is a first forward pass through the unrolled network (represented by the dashed arrows); then the output sequence is evaluated</image:caption>
      <image:title>Handling Variable-Length Output Sequences</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_18.webp</image:loc>
      <image:caption>Handling Variable-Length Output Sequences. Just like in regular backpropagation, there is a first forward pass through the unrolled network (represented by the dashed arrows); then the output sequence is evaluated</image:caption>
      <image:title>Handling Variable-Length Output Sequences</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_19.webp</image:loc>
      <image:caption>Handling Variable-Length Output Sequences</image:caption>
      <image:title>Handling Variable-Length Output Sequences</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_20.webp</image:loc>
      <image:caption>Handling Variable-Length Output Sequences</image:caption>
      <image:title>Handling Variable-Length Output Sequences</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_21.webp</image:loc>
      <image:caption>Handling Variable-Length Output Sequences. using a cost function C t min , t min + 1 , ⋯, t max (where t min and t max are the first and last output time steps, not counting the ignored outputs), and the gradients of that cost function are propagated backward through the unrolled network (repre‐ sented by the solid arrows); and finally the model parameters are updated using the gradients computed during BPTT.</image:caption>
      <image:title>Handling Variable-Length Output Sequences</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_22.webp</image:loc>
      <image:caption>Training a Sequence Classifier. (one per class) connected to the output of the last time step, followed by a softmax layer (see. Sequence classifier</image:caption>
      <image:title>Training a Sequence Classifier</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_23.webp</image:loc>
      <image:caption>Training a Sequence Classifier. We get over 98% accuracy—not bad! Plus you would certainly get a better result by tuning the hyperparameters, initializing the RNN weights using He initialization, training longer, or adding a bit of regularization (e.g., dropout). You can specify an initializer for the RNN by wrapping its construction code in a variable scope (e.g., use variable_scope(&quot;rnn&quot;, initializer=variance_scaling_ini tializer()) to use He initialization)</image:caption>
      <image:title>Training a Sequence Classifier</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_24.webp</image:loc>
      <image:caption>Training to Predict Time Series. Now let’s take a look at how to handle time series, such as stock prices, air tempera‐ ture, brain wave patterns, and so on. Time series (left), and a training instance from that series (right)</image:caption>
      <image:title>Training to Predict Time Series</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_25.webp</image:loc>
      <image:caption>Training to Predict Time Series. cell = tf.contrib.rnn.BasicRNNCell(num_units=n_neurons, activation=tf.nn.relu) outputs, states = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32). In general you would have more than just one input feature.</image:caption>
      <image:title>Training to Predict Time Series</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_26.webp</image:loc>
      <image:caption>Training to Predict Time Series. every method call to an underlying cell, but it also adds some functionality. RNN cells using output projections</image:caption>
      <image:title>Training to Predict Time Series</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_27.webp</image:loc>
      <image:caption>Training to Predict Time Series. shows the predicted sequence for the instance we looked at earlier (in, after just 1,000 training iterations. Time series prediction</image:caption>
      <image:title>Training to Predict Time Series</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_28.webp</image:loc>
      <image:caption>Training to Predict Time Series. to [batch_size, n_steps, n_outputs]. These operations are represented in. Stack all the outputs, apply the projection, then unstack the result</image:caption>
      <image:title>Training to Predict Time Series</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_29.webp</image:loc>
      <image:caption>Creative RNN. X_batch = np.array(sequence[-n_steps:]).reshape(1, n_steps, 1) y_pred = sess.run(outputs, feed_dict={X: X_batch}) sequence.append(y_pred[0, -1, 0]). Creative sequences, seeded with zeros (left) or with an instance (right)</image:caption>
      <image:title>Creative RNN</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_30.webp</image:loc>
      <image:caption>Deep RNNs. Deep RNN (left), unrolled through time (right). To implement a deep RNN in TensorFlow, you can create several cells and stack them into a MultiRNNCell. In the following code we stack three identical cells (but you could very well use various kinds of cells with a different number of neurons)</image:caption>
      <image:title>Deep RNNs</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_31.webp</image:loc>
      <image:caption>Distributing a Deep RNN Across Multiple GPUs. outputs, states = tf.nn.dynamic_rnn(multi_layer_cell, X, dtype=tf.float32). Do not set state_is_tuple=False, or the MultiRNNCell will con‐ catenate all the cell states into a single tensor, on a single GPU</image:caption>
      <image:title>Distributing a Deep RNN Across Multiple GPUs</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_32.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. So how does an LSTM cell work? The architecture of a basic LSTM cell is shown in. LSTM cell</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_33.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. Equation 14-3. LSTM computations</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_34.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. Equation 14-3. LSTM computations</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_35.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. Equation 14-3. LSTM computations</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_36.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_37.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. t = σ T · + T · +</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_38.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. xi t hi t − 1 i</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_39.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. xi t hi t − 1 i</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_40.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. xi t hi t − 1 i</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_41.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. t = σ xf T · t + hf T · t − 1 + f</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_42.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. t = σ xf T · t + hf T · t − 1 + f</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_43.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. t = σ xf T · t + hf T · t − 1 + f</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_44.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. t = σ xf T · t + hf T · t − 1 + f</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_45.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_46.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. o t = σ T · + T · +</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_47.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. xo t ho t − 1 o</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_48.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. xo t ho t − 1 o</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_49.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. xo t ho t − 1 o</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_50.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. t = tanh xgT · t + hgT · t − 1 + g</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_51.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. t = tanh xgT · t + hgT · t − 1 + g</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_52.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. t = tanh xgT · t + hgT · t − 1 + g. t = t ⊗ t − 1 + t ⊗ t</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_53.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. t = t ⊗ t − 1 + t ⊗ t. t = t = t ⊗ tanh t</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_54.webp</image:loc>
      <image:caption>The Difficulty of Training over Many Time Steps. t = t = t ⊗ tanh t. W xi , W xf , W xo , W xg are the weight matrices of each of the four layers for their con‐ nection to the input vector x ( t )</image:caption>
      <image:title>The Difficulty of Training over Many Time Steps</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_55.webp</image:loc>
      <image:caption>GRU Cell. The Gated Recurrent Unit (GRU) cell (see was proposed by Kyunghyun Cho et al. in a 2014 paper 7 that also introduced the Encoder–Decoder network we mentioned earlier. GRU cell</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_56.webp</image:loc>
      <image:caption>GRU Cell. Equation 14-4. GRU computations</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_57.webp</image:loc>
      <image:caption>GRU Cell. Equation 14-4. GRU computations</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_58.webp</image:loc>
      <image:caption>GRU Cell. Equation 14-4. GRU computations. t = σ xzT · t + hzT · t − 1</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_59.webp</image:loc>
      <image:caption>GRU Cell. t = σ xzT · t + hzT · t − 1</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_60.webp</image:loc>
      <image:caption>GRU Cell. t = σ xzT · t + hzT · t − 1</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_61.webp</image:loc>
      <image:caption>GRU Cell. t = σ xzT · t + hzT · t − 1. t = σ xrT · t + hrT · t − 1</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_62.webp</image:loc>
      <image:caption>GRU Cell. t = σ xrT · t + hrT · t − 1</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_63.webp</image:loc>
      <image:caption>GRU Cell. t = σ xrT · t + hrT · t − 1</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_64.webp</image:loc>
      <image:caption>GRU Cell. t = σ xrT · t + hrT · t − 1</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_65.webp</image:loc>
      <image:caption>GRU Cell. t = tanh xgT · t + hgT · t ⊗ t − 1</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_66.webp</image:loc>
      <image:caption>GRU Cell. t = tanh xgT · t + hgT · t ⊗ t − 1</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_67.webp</image:loc>
      <image:caption>GRU Cell. t = tanh xgT · t + hgT · t ⊗ t − 1</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_68.webp</image:loc>
      <image:caption>GRU Cell. t = tanh xgT · t + hgT · t ⊗ t − 1</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_69.webp</image:loc>
      <image:caption>GRU Cell</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_70.webp</image:loc>
      <image:caption>GRU Cell. t = 1 − t ⊗ tanh xgT · t − 1 + t ⊗ t</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_71.webp</image:loc>
      <image:caption>GRU Cell. t = 1 − t ⊗ tanh xgT · t − 1 + t ⊗ t. Creating a GRU cell in TensorFlow is trivial</image:caption>
      <image:title>GRU Cell</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_72.webp</image:loc>
      <image:caption>Word Embeddings. Chapter 14: Recurrent Neural Networks. Embeddings are also useful for representing categorical attributes that can take on a large number of different values, especially when there are complex similarities between values.</image:caption>
      <image:title>Word Embeddings</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_73.webp</image:loc>
      <image:caption>An Encoder–Decoder Network for Machine Translation. Let’s take a look at a simple machine translation model 10 that will translate English sentences to French (see. A simple machine translation model</image:caption>
      <image:title>An Encoder–Decoder Network for Machine Translation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter14_74.webp</image:loc>
      <image:caption>An Encoder–Decoder Network for Machine Translation. Note that at inference time (after training), you will not have the target sentence to feed to the decoder. Feeding the previous output word as input at inference time</image:caption>
      <image:title>An Encoder–Decoder Network for Machine Translation</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/ml/mlchapter15</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_0.webp</image:loc>
      <image:caption>Efficient Data Representations. layer composed of two neurons (the encoder), and one output layer composed of three neurons (the decoder). The chess memory experiment (left) and a simple autoencoder (right)</image:caption>
      <image:title>Efficient Data Representations</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_1.webp</image:loc>
      <image:caption>Performing PCA with an Undercomplete Linear Autoencoder. PCA performed by an undercomplete linear autoencoder</image:caption>
      <image:title>Performing PCA with an Undercomplete Linear Autoencoder</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_2.webp</image:loc>
      <image:caption>Stacked Autoencoders. The architecture of a stacked autoencoder is typically symmetrical with regards to the central hidden layer (the coding layer). Stacked autoencoder</image:caption>
      <image:title>Stacked Autoencoders</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_3.webp</image:loc>
      <image:caption>Training One Autoencoder at a Time. Rather than training the whole stacked autoencoder in one go like we just did, it is often much faster to train one shallow autoencoder at a time, then stack all of them into a single stacked autoencoder (hence the name), as shown on. Training one autoencoder at a time</image:caption>
      <image:title>Training One Autoencoder at a Time</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_4.webp</image:loc>
      <image:caption>Training One Autoencoder at a Time. Another approach is to use a single graph containing the whole stacked autoencoder, plus some extra operations to perform each training phase, as shown in. A single graph to train a stacked autoencoder</image:caption>
      <image:title>Training One Autoencoder at a Time</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_5.webp</image:loc>
      <image:caption>Training One Autoencoder at a Time. During the execution phase, all you need to do is run the phase 1 training op for a number of epochs, then the phase 2 training op for some more epochs. Since hidden layer 1 is frozen during phase 2, its output will always be the same for any given training instance.</image:caption>
      <image:title>Training One Autoencoder at a Time</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_6.webp</image:loc>
      <image:caption>Visualizing the Reconstructions. shows the resulting images. Original digits (left) and their reconstructions (right)</image:caption>
      <image:title>Visualizing the Reconstructions</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_7.webp</image:loc>
      <image:caption>Visualizing Features. You may get low-level features such as the ones shown in. Features learned by five neurons from the first hidden layer</image:caption>
      <image:title>Visualizing Features</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_8.webp</image:loc>
      <image:caption>Unsupervised Pretraining Using Stacked Autoencoders. you really don’t have much labeled training data, you may want to freeze the pre‐ trained layers (at least the lower ones). Unsupervised pretraining using autoencoders</image:caption>
      <image:title>Unsupervised Pretraining Using Stacked Autoencoders</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_9.webp</image:loc>
      <image:caption>Unsupervised Pretraining Using Stacked Autoencoders. Unsupervised pretraining using autoencoders. This situation is actually quite common, because building a large unlabeled dataset is often cheap (e.g., a simple script can download millions of images off the internet), but labeling them can only be done reliably by humans (e.g., classifying images as cute or not).</image:caption>
      <image:title>Unsupervised Pretraining Using Stacked Autoencoders</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_10.webp</image:loc>
      <image:caption>Denoising Autoencoders. The noise can be pure Gaussian noise added to the inputs, or it can be randomly switched off inputs, just like in dropout (introduced in Chapter 11 ).shows both options. Denoising autoencoders, with Gaussian noise (left) or dropout (right)</image:caption>
      <image:title>Denoising Autoencoders</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_11.webp</image:loc>
      <image:caption>TensorFlow Implementation. [..]. Since the shape of X is only partially defined during the construc‐ tion phase, we cannot know in advance the shape of the noise that we must add to X.</image:caption>
      <image:title>TensorFlow Implementation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_12.webp</image:loc>
      <image:caption>TensorFlow Implementation. Once we have the mean activation per neuron, we want to penalize the neurons that are too active by adding a sparsity loss to the cost function. Sparsity loss</image:caption>
      <image:title>TensorFlow Implementation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_13.webp</image:loc>
      <image:caption>TensorFlow Implementation. Chapter 15: Autoencoders. these distributions, noted D KL ( P Q ), can be computed using Equation 15-1</image:caption>
      <image:title>TensorFlow Implementation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_14.webp</image:loc>
      <image:caption>TensorFlow Implementation. Equation 15-1. Kullback–Leibler divergence</image:caption>
      <image:title>TensorFlow Implementation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_15.webp</image:loc>
      <image:caption>TensorFlow Implementation. Equation 15-1. Kullback–Leibler divergence</image:caption>
      <image:title>TensorFlow Implementation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_16.webp</image:loc>
      <image:caption>TensorFlow Implementation. Equation 15-1. Kullback–Leibler divergence</image:caption>
      <image:title>TensorFlow Implementation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_17.webp</image:loc>
      <image:caption>TensorFlow Implementation</image:caption>
      <image:title>TensorFlow Implementation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_18.webp</image:loc>
      <image:caption>TensorFlow Implementation. D KL P Q = ∑ P i log P</image:caption>
      <image:title>TensorFlow Implementation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_19.webp</image:loc>
      <image:caption>TensorFlow Implementation. Equation 15-2. KL divergence between the target sparsity p and the actual sparsity q</image:caption>
      <image:title>TensorFlow Implementation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_20.webp</image:loc>
      <image:caption>TensorFlow Implementation. Equation 15-2. KL divergence between the target sparsity p and the actual sparsity q</image:caption>
      <image:title>TensorFlow Implementation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_21.webp</image:loc>
      <image:caption>TensorFlow Implementation. Equation 15-2. KL divergence between the target sparsity p and the actual sparsity q</image:caption>
      <image:title>TensorFlow Implementation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_22.webp</image:loc>
      <image:caption>TensorFlow Implementation</image:caption>
      <image:title>TensorFlow Implementation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_23.webp</image:loc>
      <image:caption>TensorFlow Implementation. D KL p q = p log p + 1 − p log 1 − p</image:caption>
      <image:title>TensorFlow Implementation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_24.webp</image:loc>
      <image:caption>Variational Autoencoders. coder. Variational autoencoder (left), and an instance going through it (right)</image:caption>
      <image:title>Variational Autoencoders</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_25.webp</image:loc>
      <image:caption>Variational Autoencoders. 1 - tf.log(eps + tf.square(hidden3_sigma)))</image:caption>
      <image:title>Variational Autoencoders</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_26.webp</image:loc>
      <image:caption>Variational Autoencoders. 1 - tf.log(eps + tf.square(hidden3_sigma))). One common variant is to train the encoder to output γ = log( σ 2) rather than σ .</image:caption>
      <image:title>Variational Autoencoders</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter15_27.webp</image:loc>
      <image:caption>Generating Digits. Images of handwritten digits generated by the variational autoencoder. A majority of these digits look pretty convincing, while a few are rather “creative.” But don’t be too harsh on the autoencoder—it only started learning less than an hour ago.</image:caption>
      <image:title>Generating Digits</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/ml/mlchapter16</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_0.webp</image:loc>
      <image:caption>Learning to Optimize Rewards. Reinforcement Learning examples: (a) walking robot, (b) Ms. Pac-Man, (c) Go player, (d) thermostat, (e) automatic trader 5. Note that there may not be any positive rewards at all; for example, the agent may move around in a maze, getting a negative reward at every time step, so it better find the exit as quickly as possible!</image:caption>
      <image:title>Learning to Optimize Rewards</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_1.webp</image:loc>
      <image:caption>Policy Search. The algorithm used by the software agent to determine its actions is called its policy . For example, the policy could be a neural network taking observations as inputs and outputting the action to take (see. Reinforcement Learning using a neural network policy</image:caption>
      <image:title>Policy Search</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_2.webp</image:loc>
      <image:caption>Policy Search. plus their offspring together constitute the second generation. You can continue to iterate through generations this way, until you find a good policy. Four points in policy space and the agent’s corresponding behavior</image:caption>
      <image:title>Policy Search</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_3.webp</image:loc>
      <image:caption>Introduction to OpenAI Gym. The make() function creates an environment, in this case a CartPole environment. The CartPole environment</image:caption>
      <image:title>Introduction to OpenAI Gym</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_4.webp</image:loc>
      <image:caption>Introduction to OpenAI Gym. (400, 600, 3). Unfortunately, the CartPole (and a few other environments) ren‐ ders the image to the screen even if you set the mode to &quot;rgb_array&quot;.</image:caption>
      <image:title>Introduction to OpenAI Gym</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_5.webp</image:loc>
      <image:caption>Neural Network Policies. For example, if it outputs 0.7, then we will pick action 0 with 70% probability, and action 1 with 30% probability. Neural network policy</image:caption>
      <image:title>Neural Network Policies</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_6.webp</image:loc>
      <image:caption>Evaluating Actions: The Credit Assignment Problem. Discounted rewards. Of course, a good action may be followed by several bad actions that cause the pole to fall quickly, resulting in the good action getting a low score (similarly, a good actor may sometimes star in a terrible movie).</image:caption>
      <image:title>Evaluating Actions: The Credit Assignment Problem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_7.webp</image:loc>
      <image:caption>Policy Gradients. Chapter 16: Reinforcement Learning. Researchers try to find algorithms that work well even when the agent initially knows nothing about the environment.</image:caption>
      <image:title>Policy Gradients</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_8.webp</image:loc>
      <image:caption>Markov Decision Processes. Example of a Markov chain. Markov decision processes were first described in the 1950s by Richard Bellman .</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_9.webp</image:loc>
      <image:caption>Markov Decision Processes. Example of a Markov decision process. Bellman found a way to estimate the optimal state value of any state s , noted V *( s ), which is the sum of all discounted future rewards the agent can expect on average after it reaches a state s , assuming it acts optimally.</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_10.webp</image:loc>
      <image:caption>Markov Decision Processes. Equation 16-1. Bellman Optimality Equation</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_11.webp</image:loc>
      <image:caption>Markov Decision Processes. Equation 16-1. Bellman Optimality Equation</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_12.webp</image:loc>
      <image:caption>Markov Decision Processes. Equation 16-1. Bellman Optimality Equation</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_13.webp</image:loc>
      <image:caption>Markov Decision Processes</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_14.webp</image:loc>
      <image:caption>Markov Decision Processes</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_15.webp</image:loc>
      <image:caption>Markov Decision Processes</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_16.webp</image:loc>
      <image:caption>Markov Decision Processes</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_17.webp</image:loc>
      <image:caption>Markov Decision Processes. V * s = max a ∑ s ′ T s , a , s ′ R s , a , s ′ + γ . V * s ′ for all s</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_18.webp</image:loc>
      <image:caption>Markov Decision Processes. Equation 16-2. Value Iteration algorithm</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_19.webp</image:loc>
      <image:caption>Markov Decision Processes. Equation 16-2. Value Iteration algorithm</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_20.webp</image:loc>
      <image:caption>Markov Decision Processes. Equation 16-2. Value Iteration algorithm</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_21.webp</image:loc>
      <image:caption>Markov Decision Processes</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_22.webp</image:loc>
      <image:caption>Markov Decision Processes</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_23.webp</image:loc>
      <image:caption>Markov Decision Processes</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_24.webp</image:loc>
      <image:caption>Markov Decision Processes</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_25.webp</image:loc>
      <image:caption>Markov Decision Processes</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_26.webp</image:loc>
      <image:caption>Markov Decision Processes. Vk + 1 s m a ax ∑ s ′ T s , a , s ′ R s , a , s ′ + γ . Vk s ′ for all s</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_27.webp</image:loc>
      <image:caption>Markov Decision Processes. V k ( s ) is the estimated value of state s at the k th iteration of the algorithm. This algorithm is an example of Dynamic Programming , which breaks down a complex problem (in this case estimating a poten‐ tially infinite sum of discounted future rewards) into tractable sub- problems that can be tackled iteratively (in this case f</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_28.webp</image:loc>
      <image:caption>Markov Decision Processes. Equation 16-3. Q-Value Iteration algorithm</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_29.webp</image:loc>
      <image:caption>Markov Decision Processes. Equation 16-3. Q-Value Iteration algorithm</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_30.webp</image:loc>
      <image:caption>Markov Decision Processes. Equation 16-3. Q-Value Iteration algorithm</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_31.webp</image:loc>
      <image:caption>Markov Decision Processes</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_32.webp</image:loc>
      <image:caption>Markov Decision Processes</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_33.webp</image:loc>
      <image:caption>Markov Decision Processes</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_34.webp</image:loc>
      <image:caption>Markov Decision Processes</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_35.webp</image:loc>
      <image:caption>Markov Decision Processes</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_36.webp</image:loc>
      <image:caption>Markov Decision Processes. Qk + 1 s , a ∑ T s , a , s ′ R s , a , s ′ + γ . max Qk s ′, a ′ for all s , a</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_37.webp</image:loc>
      <image:caption>Markov Decision Processes. s ′ a ′</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_38.webp</image:loc>
      <image:caption>Markov Decision Processes. s ′ a ′</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_39.webp</image:loc>
      <image:caption>Markov Decision Processes. s ′ a ′</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_40.webp</image:loc>
      <image:caption>Markov Decision Processes. Once you have the optimal Q-Values, defining the optimal policy, noted π *( s ), is triv‐ ial: when the agent is in state s , it should choose the action with the highest Q-Value for that state: π * s = argmax Q * s , a</image:caption>
      <image:title>Markov Decision Processes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_41.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning. Equation 16-4. TD Learning algorithm</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_42.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning. Equation 16-4. TD Learning algorithm</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_43.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning. Equation 16-4. TD Learning algorithm</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_44.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_45.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_46.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_47.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_48.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_49.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning. Vk + 1 s 1 − α Vk s + α r + γ . Vk s ′</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_50.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning. α is the learning rate (e.g., 0.01). TD Learning has many similarities with Stochastic Gradient Descent, in particular the fact that it handles one sample at a time.</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_51.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning. Equation 16-5. Q-Learning algorithm</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_52.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning. Equation 16-5. Q-Learning algorithm</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_53.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning. Equation 16-5. Q-Learning algorithm</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_54.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_55.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_56.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_57.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_58.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_59.webp</image:loc>
      <image:caption>Temporal Difference Learning and Q-Learning. Qk + 1 s , a 1 − α Qk s , a + α r + γ . max Qk s ′, a ′</image:caption>
      <image:title>Temporal Difference Learning and Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_60.webp</image:loc>
      <image:caption>Exploration Policies. Equation 16-6. Q-Learning using an exploration function. Q s , a 1 − α Q s , a + α r + γ . max f Q s ′, a ′ , N s ′, a ′</image:caption>
      <image:title>Exploration Policies</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_61.webp</image:loc>
      <image:caption>Exploration Policies. Q s , a 1 − α Q s , a + α r + γ . max f Q s ′, a ′ , N s ′, a ′. α ′</image:caption>
      <image:title>Exploration Policies</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_62.webp</image:loc>
      <image:caption>Learning to Play Ms. Pac-Man Using Deep Q-Learning. Ms. Pac-Man observation, original (left) and after preprocessing (right). Next, let’s create the DQN.</image:caption>
      <image:title>Learning to Play Ms. Pac-Man Using Deep Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_63.webp</image:loc>
      <image:caption>Learning to Play Ms. Pac-Man Using Deep Q-Learning. Next, let’s create the DQN. Deep Q-network to play Ms. Pac-Man</image:caption>
      <image:title>Learning to Play Ms. Pac-Man Using Deep Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_64.webp</image:loc>
      <image:caption>Learning to Play Ms. Pac-Man Using Deep Q-Learning. Equation 16-7. Deep Q-Learning cost function</image:caption>
      <image:title>Learning to Play Ms. Pac-Man Using Deep Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_65.webp</image:loc>
      <image:caption>Learning to Play Ms. Pac-Man Using Deep Q-Learning. Equation 16-7. Deep Q-Learning cost function</image:caption>
      <image:title>Learning to Play Ms. Pac-Man Using Deep Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_66.webp</image:loc>
      <image:caption>Learning to Play Ms. Pac-Man Using Deep Q-Learning. Equation 16-7. Deep Q-Learning cost function</image:caption>
      <image:title>Learning to Play Ms. Pac-Man Using Deep Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_67.webp</image:loc>
      <image:caption>Learning to Play Ms. Pac-Man Using Deep Q-Learning</image:caption>
      <image:title>Learning to Play Ms. Pac-Man Using Deep Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_68.webp</image:loc>
      <image:caption>Learning to Play Ms. Pac-Man Using Deep Q-Learning. θ = 1 ∑ m y i − Q s i , a i , θ 2</image:caption>
      <image:title>Learning to Play Ms. Pac-Man Using Deep Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_69.webp</image:loc>
      <image:caption>Learning to Play Ms. Pac-Man Using Deep Q-Learning. critic</image:caption>
      <image:title>Learning to Play Ms. Pac-Man Using Deep Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_70.webp</image:loc>
      <image:caption>Learning to Play Ms. Pac-Man Using Deep Q-Learning. critic. with y i = r i + γ . max Q s ′ i , a ′, θ actor</image:caption>
      <image:title>Learning to Play Ms. Pac-Man Using Deep Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_71.webp</image:loc>
      <image:caption>Learning to Play Ms. Pac-Man Using Deep Q-Learning. J ( θ critic ) is the cost function used to train the critic DQN. As you can see, it is just the Mean Squared Error between the target Q-Values y ( i ) as estimated by the actor DQN, and the critic DQN’s predictions of these Q-Values. The replay memory is optional, but highly recommended.</image:caption>
      <image:title>Learning to Play Ms. Pac-Man Using Deep Q-Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter16_72.webp</image:loc>
      <image:caption>Learning to Play Ms. Pac-Man Using Deep Q-Learning. Chapter 16: Reinforcement Learning</image:caption>
      <image:title>Learning to Play Ms. Pac-Man Using Deep Q-Learning</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/ml/mlchapter2</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_0.webp</image:loc>
      <image:caption>Working with Real Data. In this chapter we chose the California Housing Prices dataset from the StatLib repos‐ itory 2 (see. California housing prices</image:caption>
      <image:title>Working with Real Data</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_1.webp</image:loc>
      <image:caption>Look at the Big Picture. Your model should learn from this data and be able to predict the median housing price in any district, given all the other metrics. Since you are a well-organized data scientist, the first thing you do is to pull out your Machine Learning project checklist.</image:caption>
      <image:title>Look at the Big Picture</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_2.webp</image:loc>
      <image:caption>Frame the Problem. A Machine Learning pipeline for real estate investments. Pipelines</image:caption>
      <image:title>Frame the Problem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_3.webp</image:loc>
      <image:caption>Frame the Problem. Have you found the answers?. If the data was huge, you could either split your batch learning work across multiple servers (using the MapReduce technique, as we will see later), or you could use an online learning technique instead</image:caption>
      <image:title>Frame the Problem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_4.webp</image:loc>
      <image:caption>Select a Performance Measure. Equation 2-1. Root Mean Square Error (RMSE)</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_5.webp</image:loc>
      <image:caption>Select a Performance Measure. Equation 2-1. Root Mean Square Error (RMSE). m i = 1</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_6.webp</image:loc>
      <image:caption>Select a Performance Measure. h i − y i 2</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_7.webp</image:loc>
      <image:caption>Select a Performance Measure. h i − y i 2. RMSE , h =</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_8.webp</image:loc>
      <image:caption>Select a Performance Measure. Recall that the transpose operator flips a column vector into a row vector (and vice versa)</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_9.webp</image:loc>
      <image:caption>Select a Performance Measure. Recall that the transpose operator flips a column vector into a row vector (and vice versa)</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_10.webp</image:loc>
      <image:caption>Select a Performance Measure. Recall that the transpose operator flips a column vector into a row vector (and vice versa)</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_11.webp</image:loc>
      <image:caption>Select a Performance Measure. 2000 T</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_12.webp</image:loc>
      <image:caption>Select a Performance Measure. Equation 2-2. Mean Absolute Error</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_13.webp</image:loc>
      <image:caption>Select a Performance Measure. Equation 2-2. Mean Absolute Error</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_14.webp</image:loc>
      <image:caption>Select a Performance Measure. Equation 2-2. Mean Absolute Error. MAE , h = 1 ∑ m h i − y i</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_15.webp</image:loc>
      <image:caption>Select a Performance Measure. MAE , h = 1 ∑ m h i − y i. m i = 1</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_16.webp</image:loc>
      <image:caption>Select a Performance Measure. Both the RMSE and the MAE are ways to measure the distance between two vectors: the vector of predictions and the vector of target values. Various distance measures, or norms , are possible</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_17.webp</image:loc>
      <image:caption>Select a Performance Measure. Both the RMSE and the MAE are ways to measure the distance between two vectors: the vector of predictions and the vector of target values. Various distance measures, or norms , are possible</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_18.webp</image:loc>
      <image:caption>Select a Performance Measure. Both the RMSE and the MAE are ways to measure the distance between two vectors: the vector of predictions and the vector of target values. Various distance measures, or norms , are possible</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_19.webp</image:loc>
      <image:caption>Select a Performance Measure</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_20.webp</image:loc>
      <image:caption>Select a Performance Measure</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_21.webp</image:loc>
      <image:caption>Select a Performance Measure. Computing the root of a sum of squares (RMSE) corresponds to the Euclidian norm : it is the notion of distance you are familiar with. It is also called the ℓ 2 norm , noted · 2 (or just · )</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_22.webp</image:loc>
      <image:caption>Select a Performance Measure. 1</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_23.webp</image:loc>
      <image:caption>Select a Performance Measure. 1</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_24.webp</image:loc>
      <image:caption>Select a Performance Measure. 1</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_25.webp</image:loc>
      <image:caption>Select a Performance Measure. k = v 0 k + v 1 k + ⋯ + vn k k . ℓ 0 just gives the cardinality of the vector (i.e., the number of elements), and ℓ ∞ gives the maximum absolute value in the vector</image:caption>
      <image:title>Select a Performance Measure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_26.webp</image:loc>
      <image:caption>Creating an Isolated Environment. Your workspace in Jupyter. A notebook contains a list of cells.</image:caption>
      <image:title>Creating an Isolated Environment</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_27.webp</image:loc>
      <image:caption>Creating an Isolated Environment. A notebook contains a list of cells. Hello world Python notebook</image:caption>
      <image:title>Creating an Isolated Environment</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_28.webp</image:loc>
      <image:caption>Take a Quick Look at the Data Structure. Let’s take a look at the top five rows using the DataFrame’s head() method (see. Top five rows in the dataset</image:caption>
      <image:title>Take a Quick Look at the Data Structure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_29.webp</image:loc>
      <image:caption>Take a Quick Look at the Data Structure. The info() method is useful to get a quick description of the data, in particular the total number of rows, and each attribute’s type and number of non-null values (see. Housing info</image:caption>
      <image:title>Take a Quick Look at the Data Structure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_30.webp</image:loc>
      <image:caption>Take a Quick Look at the Data Structure. Let’s look at the other fields. The describe() method shows a summary of the numerical attributes. Summary of each numerical attribute</image:caption>
      <image:title>Take a Quick Look at the Data Structure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_31.webp</image:loc>
      <image:caption>Take a Quick Look at the Data Structure. plt.show(). A histogram for each numerical attribute</image:caption>
      <image:title>Take a Quick Look at the Data Structure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_32.webp</image:loc>
      <image:caption>Take a Quick Look at the Data Structure. Get the Data. user-specified graphical backend to draw on your screen.</image:caption>
      <image:title>Take a Quick Look at the Data Structure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_33.webp</image:loc>
      <image:caption>Take a Quick Look at the Data Structure. Chapter 2: End-to-End Machine Learning Project. Wait! Before you look at the data any further, you need to create a test set, put it aside, and never look at it</image:caption>
      <image:title>Take a Quick Look at the Data Structure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_34.webp</image:loc>
      <image:caption>Create a Test Set. Suppose you chatted with experts who told you that the median income is a very important attribute to predict median housing prices. Histogram of income categories</image:caption>
      <image:title>Create a Test Set</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_35.webp</image:loc>
      <image:caption>Create a Test Set. With similar code you can measure the income category proportions in the test set.compares the income category proportions in the overall dataset, in the test set generated with stratified sampling, and in a test set generated using purely random sampling. Sampling bias comparison of stratified versus purely random sampling</image:caption>
      <image:title>Create a Test Set</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_36.webp</image:loc>
      <image:caption>Visualizing Geographical Data. housing.plot(kind=&quot;scatter&quot;, x=&quot;longitude&quot;, y=&quot;latitude&quot;). A geographical scatterplot of the data</image:caption>
      <image:title>Visualizing Geographical Data</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_37.webp</image:loc>
      <image:caption>Visualizing Geographical Data. housing.plot(kind=&quot;scatter&quot;, x=&quot;longitude&quot;, y=&quot;latitude&quot;, alpha=0.1). A better visualization highlighting high-density areas</image:caption>
      <image:title>Visualizing Geographical Data</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_38.webp</image:loc>
      <image:caption>Visualizing Geographical Data. California housing prices. This image tells you that the housing prices are very much related to the location (e.g., close to the ocean) and to the population density, as you probably knew already.</image:caption>
      <image:title>Visualizing Geographical Data</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_39.webp</image:loc>
      <image:caption>Looking for Correlations. The correlation coefficient ranges from –1 to 1. Standard correlation coefficient of various datasets (source: Wikipedia; public domain image)</image:caption>
      <image:title>Looking for Correlations</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_40.webp</image:loc>
      <image:caption>Looking for Correlations. Standard correlation coefficient of various datasets (source: Wikipedia; public domain image). The correlation coefficient only measures linear correlations (“if x goes up, then y generally goes up/down”).</image:caption>
      <image:title>Looking for Correlations</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_41.webp</image:loc>
      <image:caption>Looking for Correlations. scatter_matrix(housing[attributes], figsize=(12, 8)). Scatter matrix</image:caption>
      <image:title>Looking for Correlations</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_42.webp</image:loc>
      <image:caption>Looking for Correlations. Median income versus median house value. This plot reveals a few things.</image:caption>
      <image:title>Looking for Correlations</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_43.webp</image:loc>
      <image:caption>Feature Scaling. Prepare the Data for Machine Learning Algorithms. As with all the transformations, it is important to fit the scalers to the training data only, not to the full dataset (including the test set). Only then can you use them to transform the training set and the test set (and new data)</image:caption>
      <image:title>Feature Scaling</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_44.webp</image:loc>
      <image:caption>Better Evaluation Using Cross-Validation. Select and Train a Model. Scikit-Learn cross-validation features expect a utility function (greater is better) rather than a cost function (lower is better), so the scoring function is actually the opposite of the MSE (i.e., a neg‐ ative value), which is why the preceding cod</image:caption>
      <image:title>Better Evaluation Using Cross-Validation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_45.webp</image:loc>
      <image:caption>Better Evaluation Using Cross-Validation. Wow, this is much better: Random Forests look very promising. You should save every model you experiment with, so you can come back easily to any model you want.</image:caption>
      <image:title>Better Evaluation Using Cross-Validation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_46.webp</image:loc>
      <image:caption>Fine-Tune Your Model. grid_search = GridSearchCV(forest_reg, param_grid, cv=5. scoring=&apos;neg_mean_squared_error&apos;) grid_search.fit(housing_prepared, housing_labels)</image:caption>
      <image:title>Fine-Tune Your Model</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_47.webp</image:loc>
      <image:caption>Fine-Tune Your Model. Chapter 2: End-to-End Machine Learning Project. Since 30 is the maximum value of n_estimators that was evalu‐ ated, you should probably evaluate higher values as well, since the score may continue to improve</image:caption>
      <image:title>Fine-Tune Your Model</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_48.webp</image:loc>
      <image:caption>Fine-Tune Your Model. RandomForestRegressor(bootstrap=True, criterion=&apos;mse&apos;, max_depth=None, max_features=6, max_leaf_nodes=None, min_samples_leaf=1, min_samples_split=2, min_weight_fraction_leaf=0.0, n_estimators=30, n_jobs=1, oob_score=False, random_state=None, verbose=. If GridSearchCV is initialized with refit=True (which is the default), then once it finds the best estimator using cross- validation, it retrains it on the whole training set.</image:caption>
      <image:title>Fine-Tune Your Model</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter2_49.webp</image:loc>
      <image:caption>Fine-Tune Your Model. default hyperparameter values (which was 52,634). Congratulations, you have suc‐ cessfully fine-tuned your best model!. Don’t forget that you can treat some of the data preparation steps as hyperparameters.</image:caption>
      <image:title>Fine-Tune Your Model</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/ml/mlchapter3</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_0.webp</image:loc>
      <image:caption>MNIST. plt.axis(&quot;off&quot;) plt.show(). This looks like a 5, and indeed that’s what the label tells us</image:caption>
      <image:title>MNIST</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_1.webp</image:loc>
      <image:caption>MNIST. shows a few more images from the MNIST dataset to give you a feel for the complexity of the classification task. A few digits from the MNIST dataset</image:caption>
      <image:title>MNIST</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_2.webp</image:loc>
      <image:caption>Training a Binary Classifier. sgd_clf = SGDClassifier(random_state=42) sgd_clf.fit(X_train, y_train_5). The SGDClassifier relies on randomness during training (hence the name “stochastic”). If you want reproducible results, you should set the random_state parameter</image:caption>
      <image:title>Training a Binary Classifier</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_3.webp</image:loc>
      <image:caption>Confusion Matrix. If you are confused about the confusion matrix,may help. An illustrated confusion matrix</image:caption>
      <image:title>Confusion Matrix</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_4.webp</image:loc>
      <image:caption>Precision/Recall Tradeoff. Decision threshold and precision/recall tradeoff. Scikit-Learn does not let you set the threshold directly, but it does give you access to the decision scores that it uses to make predictions.</image:caption>
      <image:title>Precision/Recall Tradeoff</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_5.webp</image:loc>
      <image:caption>Precision/Recall Tradeoff. plot_precision_recall_vs_threshold(precisions, recalls, thresholds) plt.show(). Precision and recall versus the decision threshold</image:caption>
      <image:title>Precision/Recall Tradeoff</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_6.webp</image:loc>
      <image:caption>Precision/Recall Tradeoff. Precision and recall versus the decision threshold. You may wonder why the precision curve is bumpier than the recall curve in.</image:caption>
      <image:title>Precision/Recall Tradeoff</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_7.webp</image:loc>
      <image:caption>Precision/Recall Tradeoff. Precision versus recall. You can see that precision really starts to fall sharply around 80% recall. You will probably want to select a precision/recall tradeoff just before that drop—for example, at around 60% recall. But of course the choice depends on your project</image:caption>
      <image:title>Precision/Recall Tradeoff</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_8.webp</image:loc>
      <image:caption>Precision/Recall Tradeoff. Great, you have a 90% precision classifier (or close enough)!. If someone says “let’s reach 99% precision,” you should ask, “at what recall?”</image:caption>
      <image:title>Precision/Recall Tradeoff</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_9.webp</image:loc>
      <image:caption>The ROC Curve. plot_roc_curve(fpr, tpr) plt.show(). ROC curve</image:caption>
      <image:title>The ROC Curve</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_10.webp</image:loc>
      <image:caption>The ROC Curve. &gt;&gt;&gt; roc_auc_score(y_train_5, y_scores) 0.97061072797174941. Since the ROC curve is so similar to the precision/recall (or PR) curve, you may wonder how to decide which one to use.</image:caption>
      <image:title>The ROC Curve</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_11.webp</image:loc>
      <image:caption>The ROC Curve. plt.show(). Comparing ROC curves</image:caption>
      <image:title>The ROC Curve</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_12.webp</image:loc>
      <image:caption>Multiclass Classification. &gt;&gt;&gt; sgd_clf.classes[5] 5.0. When a classifier is trained, it stores the list of target classes in its classes_ attribute, ordered by value.</image:caption>
      <image:title>Multiclass Classification</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_13.webp</image:loc>
      <image:caption>Error Analysis. plt.matshow(conf_mx, cmap=plt.cm.gray) plt.show(). This confusion matrix looks fairly good, since most images are on the main diagonal, which means that they were classified correctly.</image:caption>
      <image:title>Error Analysis</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_14.webp</image:loc>
      <image:caption>Error Analysis. np.fill_diagonal(norm_conf_mx, 0) plt.matshow(norm_conf_mx, cmap=plt.cm.gray) plt.show(). Now you can clearly see the kinds of errors the classifier makes.</image:caption>
      <image:title>Error Analysis</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_15.webp</image:loc>
      <image:caption>Error Analysis. plt.subplot(224); plot_digits(X_bb[:25], images_per_row=5) plt.show(). The two 5×5 blocks on the left show digits classified as 3s, and the two 5×5 blocks on the right show images classified as 5s.</image:caption>
      <image:title>Error Analysis</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_16.webp</image:loc>
      <image:caption>Multioutput Classification. To illustrate this, let’s build a system that removes noise from images. The line between classification and regression is sometimes blurry, such as in this example.</image:caption>
      <image:title>Multioutput Classification</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_17.webp</image:loc>
      <image:caption>Multioutput Classification. Scikit-Learn offers a few other averaging options and multilabel classifier metrics; see the documentation for more details. On the left is the noisy input image, and on the right is the clean target image. Now let’s train the classifier and make it clean this image</image:caption>
      <image:title>Multioutput Classification</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter3_18.webp</image:loc>
      <image:caption>Multioutput Classification. clean_digit = knn_clf.predict([X_test_mod[some_index]]) plot_digit(clean_digit). Looks close enough to the target!</image:caption>
      <image:title>Multioutput Classification</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/ml/mlchapter4</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_0.webp</image:loc>
      <image:caption>Training Models. Finally, we will look at two more models that are commonly used for classification tasks: Logistic Regression and Softmax Regression. There will be quite a few math equations in this chapter, using basic notions of linear algebra and calculus.</image:caption>
      <image:title>Training Models</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_1.webp</image:loc>
      <image:caption>Linear Regression. Equation 4-2. Linear Regression model prediction (vectorized form)</image:caption>
      <image:title>Linear Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_2.webp</image:loc>
      <image:caption>Linear Regression. Equation 4-2. Linear Regression model prediction (vectorized form). y = hθ = θT ·</image:caption>
      <image:title>Linear Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_3.webp</image:loc>
      <image:caption>Linear Regression. Equation 4-3. MSE cost function for a Linear Regression model</image:caption>
      <image:title>Linear Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_4.webp</image:loc>
      <image:caption>Linear Regression. Equation 4-3. MSE cost function for a Linear Regression model</image:caption>
      <image:title>Linear Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_5.webp</image:loc>
      <image:caption>Linear Regression. Equation 4-3. MSE cost function for a Linear Regression model. MSE , h = 1 ∑ m θT · i − y i 2</image:caption>
      <image:title>Linear Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_6.webp</image:loc>
      <image:caption>Linear Regression. MSE , h = 1 ∑ m θT · i − y i 2. θ m i = 1</image:caption>
      <image:title>Linear Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_7.webp</image:loc>
      <image:caption>The Normal Equation. Equation 4-4. Normal Equation</image:caption>
      <image:title>The Normal Equation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_8.webp</image:loc>
      <image:caption>The Normal Equation. Equation 4-4. Normal Equation. θ = T · −1 · T ·</image:caption>
      <image:title>The Normal Equation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_9.webp</image:loc>
      <image:caption>The Normal Equation. θ = T · −1 · T ·. θ is the value of θ that minimizes the cost function</image:caption>
      <image:title>The Normal Equation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_10.webp</image:loc>
      <image:caption>The Normal Equation. Now let’s compute θ using the Normal Equation. We will use the inv() function from NumPy’s Linear Algebra module (np.linalg) to compute the inverse of a matrix, and the dot() method for matrix multiplication. X_b = np.c_[np.ones((100, 1)), X] # add x0 = 1 to each instance</image:caption>
      <image:title>The Normal Equation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_11.webp</image:loc>
      <image:caption>Computational Complexity. The Normal Equation computes the inverse of X T · X , which is an n × n matrix (where n is the number of features). The Normal Equation gets very slow when the number of features grows large (e.g., 100,000)</image:caption>
      <image:title>Computational Complexity</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_12.webp</image:loc>
      <image:caption>Gradient Descent. Concretely, you start by filling θ with random values (this is called random initializa‐ tion ), and then you improve it gradually, taking one baby step at a time, each step attempting to decrease the cost function (e.g., the MSE), until the algorith</image:caption>
      <image:title>Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_13.webp</image:loc>
      <image:caption>Gradient Descent. Learning rate too small. On the other hand, if the learning rate is too high, you might jump across the valley and end up on the other side, possibly even higher up than you were before.</image:caption>
      <image:title>Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_14.webp</image:loc>
      <image:caption>Gradient Descent. On the other hand, if the learning rate is too high, you might jump across the valley and end up on the other side, possibly even higher up than you were before. Learning rate too large</image:caption>
      <image:title>Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_15.webp</image:loc>
      <image:caption>Gradient Descent. Gradient Descent pitfalls. Fortunately, the MSE cost function for a Linear Regression model happens to be a convex function , which means that if you pick any two points on the curve, the line segment joining them never crosses the curve.</image:caption>
      <image:title>Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_16.webp</image:loc>
      <image:caption>Gradient Descent. In fact, the cost function has the shape of a bowl, but it can be an elongated bowl if the features have very different scales.shows Gradient Descent on a train‐ ing set where features 1 and 2 have the same scale (on the left), and on a training set where feature 1 has much smaller values than feature 2 (on the right). Gradient Descent with and without feature scaling</image:caption>
      <image:title>Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_17.webp</image:loc>
      <image:caption>Gradient Descent. As you can see, on the left the Gradient Descent algorithm goes straight toward the minimum, thereby reaching it quickly, whereas on the right it first goes in a direction almost orthogonal to the direction of the global minimum, and it ends with a long march down an almost flat valley. When using Gradient Descent, you should ensure that all features have a similar scale (e.g., using Scikit-Learn’s StandardScaler class), or else it will take much longer to converge</image:caption>
      <image:title>Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_18.webp</image:loc>
      <image:caption>Batch Gradient Descent. To implement Gradient Descent, you need to compute the gradient of the cost func‐ tion with regards to each model parameter θ j .</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_19.webp</image:loc>
      <image:caption>Batch Gradient Descent. To implement Gradient Descent, you need to compute the gradient of the cost func‐ tion with regards to each model parameter θ j . ter θ j , noted ∂ MSE θ</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_20.webp</image:loc>
      <image:caption>Batch Gradient Descent. Equation 4-5. Partial derivatives of the cost function</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_21.webp</image:loc>
      <image:caption>Batch Gradient Descent. Equation 4-5. Partial derivatives of the cost function</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_22.webp</image:loc>
      <image:caption>Batch Gradient Descent. Equation 4-5. Partial derivatives of the cost function. ∂ MSE θ = 2 ∑ m θT · i − y i x i</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_23.webp</image:loc>
      <image:caption>Batch Gradient Descent. ∂ MSE θ = 2 ∑ m θT · i − y i x i. ∂ θj</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_24.webp</image:loc>
      <image:caption>Batch Gradient Descent. ∂ θ 0. ∂ MSE θ 2 T</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_25.webp</image:loc>
      <image:caption>Batch Gradient Descent. ∂ MSE θ 2 T</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_26.webp</image:loc>
      <image:caption>Batch Gradient Descent. ∂ MSE θ 2 T. ∇ θ MSE θ =</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_27.webp</image:loc>
      <image:caption>Batch Gradient Descent. = m</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_28.webp</image:loc>
      <image:caption>Batch Gradient Descent. = m. · · θ −</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_29.webp</image:loc>
      <image:caption>Batch Gradient Descent. ∂ θn. Notice that this formula involves calculations over the full training set X , at each Gradient Descent step!</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_30.webp</image:loc>
      <image:caption>Batch Gradient Descent. Equation 4-7. Gradient Descent step</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_31.webp</image:loc>
      <image:caption>Batch Gradient Descent. Equation 4-7. Gradient Descent step</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_32.webp</image:loc>
      <image:caption>Batch Gradient Descent. Equation 4-7. Gradient Descent step</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_33.webp</image:loc>
      <image:caption>Batch Gradient Descent. θ next step = θ − η ∇ θ MSE θ</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_34.webp</image:loc>
      <image:caption>Batch Gradient Descent. Hey, that’s exactly what the Normal Equation found!. Gradient Descent with various learning rates</image:caption>
      <image:title>Batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_35.webp</image:loc>
      <image:caption>Stochastic Gradient Descent. On the other hand, due to its stochastic (i.e., random) nature, this algorithm is much less regular than Batch Gradient Descent: instead of gently decreasing until it reaches the minimum, the cost function will bounce up and down, decreasing only on aver‐ age.</image:caption>
      <image:title>Stochastic Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_36.webp</image:loc>
      <image:caption>Stochastic Gradient Descent. Stochastic Gradient Descent first 10 steps. Note that since instances are picked randomly, some instances may be picked several times per epoch while others may not be picked at all.</image:caption>
      <image:title>Stochastic Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_37.webp</image:loc>
      <image:caption>Mini-batch Gradient Descent. The algorithm’s progress in parameter space is less erratic than with SGD, especially with fairly large mini-batches. Gradient Descent paths in parameter space</image:caption>
      <image:title>Mini-batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_38.webp</image:loc>
      <image:caption>Mini-batch Gradient Descent. Mini-batch GD Fast Yes Fast ≥2 Yes n/a. There is almost no difference after training: all these algorithms end up with very similar models and make predictions in exactly the same way</image:caption>
      <image:title>Mini-batch Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_39.webp</image:loc>
      <image:caption>Polynomial Regression. y = 0.5 * X**2 + X + 2 + np.random.randn(m, 1). Generated nonlinear and noisy dataset</image:caption>
      <image:title>Polynomial Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_40.webp</image:loc>
      <image:caption>Polynomial Regression. Polynomial Regression model predictions. Not bad: the model estimates y = 0 . 56 x 2 + 0 . 93 x + 1 . 78 when in fact the original</image:caption>
      <image:title>Polynomial Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_41.webp</image:loc>
      <image:caption>Polynomial Regression. features a 2, a 3, b 2, and b 3, but also the combinations ab , a 2 b , and ab 2. PolynomialFeatures(degree=d) transforms an array containing n</image:caption>
      <image:title>Polynomial Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_42.webp</image:loc>
      <image:caption>Learning Curves. If you perform high-degree Polynomial Regression, you will likely fit the training data much better than with plain Linear Regression. High-degree Polynomial Regression</image:caption>
      <image:title>Learning Curves</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_43.webp</image:loc>
      <image:caption>Learning Curves. lin_reg = LinearRegression() plot_learning_curves(lin_reg, X, y). Learning curves</image:caption>
      <image:title>Learning Curves</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_44.webp</image:loc>
      <image:caption>Learning Curves. These learning curves are typical of an underfitting model. Both curves have reached a plateau; they are close and fairly high. If your model is underfitting the training data, adding more train‐ ing examples will not help. You need to use a more complex model or come up with better features</image:caption>
      <image:title>Learning Curves</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_45.webp</image:loc>
      <image:caption>Learning Curves. Learning curves for the polynomial model</image:caption>
      <image:title>Learning Curves</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_46.webp</image:loc>
      <image:caption>Learning Curves. Learning curves for the polynomial model. One way to improve an overfitting model is to feed it more training data until the validation error reaches the training error</image:caption>
      <image:title>Learning Curves</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_47.webp</image:loc>
      <image:caption>Ridge Regression. This forces the learning algorithm to not only fit the data but also keep the model weights as small as possible. It is quite common for the cost function used during training to be different from the performance measure used for testing.</image:caption>
      <image:title>Ridge Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_48.webp</image:loc>
      <image:caption>Ridge Regression. Equation 4-8. Ridge Regression cost function</image:caption>
      <image:title>Ridge Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_49.webp</image:loc>
      <image:caption>Ridge Regression. Equation 4-8. Ridge Regression cost function</image:caption>
      <image:title>Ridge Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_50.webp</image:loc>
      <image:caption>Ridge Regression. Equation 4-8. Ridge Regression cost function</image:caption>
      <image:title>Ridge Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_51.webp</image:loc>
      <image:caption>Ridge Regression. J θ = MSE θ + α 1 ∑ n θ 2</image:caption>
      <image:title>Ridge Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_52.webp</image:loc>
      <image:caption>Ridge Regression. 2 i = 1 i</image:caption>
      <image:title>Ridge Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_53.webp</image:loc>
      <image:caption>Ridge Regression. 2 i = 1 i</image:caption>
      <image:title>Ridge Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_54.webp</image:loc>
      <image:caption>Ridge Regression. 2 i = 1 i</image:caption>
      <image:title>Ridge Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_55.webp</image:loc>
      <image:caption>Ridge Regression. Note that the bias term θ 0 is not regularized (the sum starts at i = 1, not 0).</image:caption>
      <image:title>Ridge Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_56.webp</image:loc>
      <image:caption>Ridge Regression. Note that the bias term θ 0 is not regularized (the sum starts at i = 1, not 0). It is important to scale the data (e.g., using a StandardScaler) before performing Ridge Regression, as it is sensitive to the scale of the input features. This is true of most regularized models</image:caption>
      <image:title>Ridge Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_57.webp</image:loc>
      <image:caption>Ridge Regression. Chapter 4: Training Models. Equation 4-9. Ridge Regression closed-form solution</image:caption>
      <image:title>Ridge Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_58.webp</image:loc>
      <image:caption>Ridge Regression. Equation 4-9. Ridge Regression closed-form solution</image:caption>
      <image:title>Ridge Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_59.webp</image:loc>
      <image:caption>Ridge Regression. Equation 4-9. Ridge Regression closed-form solution. θ = T · + α −1 · T ·</image:caption>
      <image:title>Ridge Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_60.webp</image:loc>
      <image:caption>Lasso Regression. n</image:caption>
      <image:title>Lasso Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_61.webp</image:loc>
      <image:caption>Lasso Regression. n</image:caption>
      <image:title>Lasso Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_62.webp</image:loc>
      <image:caption>Lasso Regression. n</image:caption>
      <image:title>Lasso Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_63.webp</image:loc>
      <image:caption>Lasso Regression. J θ = MSE θ + αi ∑= 1 θi</image:caption>
      <image:title>Lasso Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_64.webp</image:loc>
      <image:caption>Lasso Regression. shows the same thing asbut replaces Ridge models with Lasso models and uses smaller α values</image:caption>
      <image:title>Lasso Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_65.webp</image:loc>
      <image:caption>Lasso Regression. the contours represent the same cost function plus an ℓ 1 penalty with α = 0.5. Lasso versus Ridge regularization</image:caption>
      <image:title>Lasso Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_66.webp</image:loc>
      <image:caption>Lasso Regression. Lasso versus Ridge regularization. On the Lasso cost function, the BGD path tends to bounce across the gutter toward the end. This is because the slope changes abruptly at θ 2 = 0. You need to gradually reduce the learning rate in order to actually converge to the global minimum</image:caption>
      <image:title>Lasso Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_67.webp</image:loc>
      <image:caption>Lasso Regression. Equation 4-11. Lasso Regression subgradient vector. sign θ 1</image:caption>
      <image:title>Lasso Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_68.webp</image:loc>
      <image:caption>Lasso Regression. −1 if θi &lt; 0</image:caption>
      <image:title>Lasso Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_69.webp</image:loc>
      <image:caption>Lasso Regression. −1 if θi &lt; 0</image:caption>
      <image:title>Lasso Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_70.webp</image:loc>
      <image:caption>Lasso Regression. −1 if θi &lt; 0</image:caption>
      <image:title>Lasso Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_71.webp</image:loc>
      <image:caption>Lasso Regression</image:caption>
      <image:title>Lasso Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_72.webp</image:loc>
      <image:caption>Lasso Regression. g θ , J = ∇ θ MSE θ + α sign θ 2</image:caption>
      <image:title>Lasso Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_73.webp</image:loc>
      <image:caption>Lasso Regression. ⋮. sign θn</image:caption>
      <image:title>Lasso Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_74.webp</image:loc>
      <image:caption>Lasso Regression. sign θn. where sign θi</image:caption>
      <image:title>Lasso Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_75.webp</image:loc>
      <image:caption>Elastic Net. Equation 4-12. Elastic Net cost function</image:caption>
      <image:title>Elastic Net</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_76.webp</image:loc>
      <image:caption>Elastic Net. Equation 4-12. Elastic Net cost function</image:caption>
      <image:title>Elastic Net</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_77.webp</image:loc>
      <image:caption>Elastic Net. Equation 4-12. Elastic Net cost function</image:caption>
      <image:title>Elastic Net</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_78.webp</image:loc>
      <image:caption>Elastic Net. J θ = MSE θ + rα ∑ θ + 1 − rα ∑ n θ 2</image:caption>
      <image:title>Elastic Net</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_79.webp</image:loc>
      <image:caption>Early Stopping. A very different way to regularize iterative learning algorithms such as Gradient Descent is to stop training as soon as the validation error reaches a minimum. Early stopping regularization</image:caption>
      <image:title>Early Stopping</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_80.webp</image:loc>
      <image:caption>Early Stopping. Early stopping regularization. With Stochastic and Mini-batch Gradient Descent, the curves are not so smooth, and it may be hard to know whether you have reached the minimum or not.</image:caption>
      <image:title>Early Stopping</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_81.webp</image:loc>
      <image:caption>Estimating Probabilities. Equation 4-13. Logistic Regression model estimated probability (vectorized form)</image:caption>
      <image:title>Estimating Probabilities</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_82.webp</image:loc>
      <image:caption>Estimating Probabilities. Equation 4-13. Logistic Regression model estimated probability (vectorized form)</image:caption>
      <image:title>Estimating Probabilities</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_83.webp</image:loc>
      <image:caption>Estimating Probabilities. Equation 4-13. Logistic Regression model estimated probability (vectorized form)</image:caption>
      <image:title>Estimating Probabilities</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_84.webp</image:loc>
      <image:caption>Estimating Probabilities. p = hθ = σ θT ·</image:caption>
      <image:title>Estimating Probabilities</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_85.webp</image:loc>
      <image:caption>Estimating Probabilities. Equation 4-14. Logistic function</image:caption>
      <image:title>Estimating Probabilities</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_86.webp</image:loc>
      <image:caption>Estimating Probabilities. Equation 4-14. Logistic function. 1 1 + exp − t</image:caption>
      <image:title>Estimating Probabilities</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_87.webp</image:loc>
      <image:caption>Estimating Probabilities. 1 1 + exp − t</image:caption>
      <image:title>Estimating Probabilities</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_88.webp</image:loc>
      <image:caption>Estimating Probabilities. 1 1 + exp − t. σ t =</image:caption>
      <image:title>Estimating Probabilities</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_89.webp</image:loc>
      <image:caption>Estimating Probabilities. Logistic function. Once the Logistic Regression model has estimated the probability p = h θ ( x ) that an instance x belongs to the positive class, it can make its prediction ŷ easily (see Equa‐ tion 4-15 )</image:caption>
      <image:title>Estimating Probabilities</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_90.webp</image:loc>
      <image:caption>Training and Cost Function. Equation 4-16. Cost function of a single training instance. p</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_91.webp</image:loc>
      <image:caption>Training and Cost Function. p</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_92.webp</image:loc>
      <image:caption>Training and Cost Function. p. c θ =</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_93.webp</image:loc>
      <image:caption>Training and Cost Function. if y = 1</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_94.webp</image:loc>
      <image:caption>Training and Cost Function. if y = 1. – log 1 − p if y = 0</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_95.webp</image:loc>
      <image:caption>Training and Cost Function. Equation 4-17. Logistic Regression cost function (log loss)</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_96.webp</image:loc>
      <image:caption>Training and Cost Function. Equation 4-17. Logistic Regression cost function (log loss)</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_97.webp</image:loc>
      <image:caption>Training and Cost Function. Equation 4-17. Logistic Regression cost function (log loss)</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_98.webp</image:loc>
      <image:caption>Training and Cost Function</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_99.webp</image:loc>
      <image:caption>Training and Cost Function. J θ = − 1 ∑ m y i log + 1 − y i log 1 − p i</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_100.webp</image:loc>
      <image:caption>Training and Cost Function. J θ = − 1 ∑ m y i log + 1 − y i log 1 − p i</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_101.webp</image:loc>
      <image:caption>Training and Cost Function. J θ = − 1 ∑ m y i log + 1 − y i log 1 − p i. p i</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_102.webp</image:loc>
      <image:caption>Training and Cost Function. p i</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_103.webp</image:loc>
      <image:caption>Training and Cost Function. p i</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_104.webp</image:loc>
      <image:caption>Training and Cost Function. p i. m i = 1</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_105.webp</image:loc>
      <image:caption>Training and Cost Function. Equation 4-18. Logistic cost function partial derivatives</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_106.webp</image:loc>
      <image:caption>Training and Cost Function. Equation 4-18. Logistic cost function partial derivatives</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_107.webp</image:loc>
      <image:caption>Training and Cost Function. Equation 4-18. Logistic cost function partial derivatives</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_108.webp</image:loc>
      <image:caption>Training and Cost Function. ∂ J θ = 1 ∑ m σ θT · i − y i x i</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_109.webp</image:loc>
      <image:caption>Training and Cost Function. ∂ J θ = 1 ∑ m σ θT · i − y i x i</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_110.webp</image:loc>
      <image:caption>Training and Cost Function. ∂ J θ = 1 ∑ m σ θT · i − y i x i. ∂ θj</image:caption>
      <image:title>Training and Cost Function</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_111.webp</image:loc>
      <image:caption>Decision Boundaries. Flowers of three iris plant species 16. Let’s try to build a classifier to detect the Iris-Virginica type based only on the petal width feature. First let’s load the data</image:caption>
      <image:title>Decision Boundaries</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_112.webp</image:loc>
      <image:caption>Decision Boundaries. Estimated probabilities and decision boundary. The petal width of Iris-Virginica flowers (represented by triangles) ranges from 1.4 cm to 2.5 cm, while the other iris flowers (represented by squares) generally have a smaller petal width, ranging from 0.1 cm to 1.8 cm.</image:caption>
      <image:title>Decision Boundaries</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_113.webp</image:loc>
      <image:caption>Decision Boundaries. Linear decision boundary. Just like the other linear models, Logistic Regression models can be regularized using ℓ 1 or ℓ 2 penalties. Scitkit-Learn actually adds an ℓ 2 penalty by default</image:caption>
      <image:title>Decision Boundaries</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_114.webp</image:loc>
      <image:caption>Decision Boundaries. Just like the other linear models, Logistic Regression models can be regularized using ℓ 1 or ℓ 2 penalties. Scitkit-Learn actually adds an ℓ 2 penalty by default. The hyperparameter controlling the regularization strength of a Scikit-Learn LogisticRegression model is not alpha (as in other linear models), but its inverse: C. The higher the value of C, the less the model is regularized</image:caption>
      <image:title>Decision Boundaries</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_115.webp</image:loc>
      <image:caption>Softmax Regression. Equation 4-19. Softmax score for class k</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_116.webp</image:loc>
      <image:caption>Softmax Regression. Equation 4-19. Softmax score for class k. sk = θ T ·</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_117.webp</image:loc>
      <image:caption>Softmax Regression. Equation 4-20. Softmax function</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_118.webp</image:loc>
      <image:caption>Softmax Regression. Equation 4-20. Softmax function. exp sk</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_119.webp</image:loc>
      <image:caption>Softmax Regression. j</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_120.webp</image:loc>
      <image:caption>Softmax Regression. j</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_121.webp</image:loc>
      <image:caption>Softmax Regression. j. pk = σ k =</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_122.webp</image:loc>
      <image:caption>Softmax Regression. Equation 4-21. Softmax Regression classifier prediction</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_123.webp</image:loc>
      <image:caption>Softmax Regression. Equation 4-21. Softmax Regression classifier prediction</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_124.webp</image:loc>
      <image:caption>Softmax Regression. Equation 4-21. Softmax Regression classifier prediction</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_125.webp</image:loc>
      <image:caption>Softmax Regression</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_126.webp</image:loc>
      <image:caption>Softmax Regression</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_127.webp</image:loc>
      <image:caption>Softmax Regression</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_128.webp</image:loc>
      <image:caption>Softmax Regression. y = argmax σ k = argmax sk = argmax θ T ·</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_129.webp</image:loc>
      <image:caption>Softmax Regression. The argmax operator returns the value of a variable that maximizes a function. In this equation, it returns the value of k that maximizes the estimated probability σ ( s ( x )) k. The Softmax Regression classifier predicts only one class at a time (i.e., it is multiclass, not multioutput) so it should be used only with mutually exclusive classes such as different types of plants.</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_130.webp</image:loc>
      <image:caption>Softmax Regression. Chapter 4: Training Models</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_131.webp</image:loc>
      <image:caption>Softmax Regression. Chapter 4: Training Models. J Θ = − 1 ∑ m ∑ K y i log</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_132.webp</image:loc>
      <image:caption>Softmax Regression. J Θ = − 1 ∑ m ∑ K y i log. p i</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_133.webp</image:loc>
      <image:caption>Softmax Regression. Equation 4-23. Cross entropy gradient vector for class k</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_134.webp</image:loc>
      <image:caption>Softmax Regression. Equation 4-23. Cross entropy gradient vector for class k. ∇ J Θ = 1 ∑ − y i i</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_135.webp</image:loc>
      <image:caption>Softmax Regression. k. m</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter4_136.webp</image:loc>
      <image:caption>Softmax Regression. shows the resulting decision boundaries, represented by the background colors. Softmax Regression decision boundaries</image:caption>
      <image:title>Softmax Regression</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/ml/mlchapter5</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_0.webp</image:loc>
      <image:caption>Linear SVM Classification. Large margin classification. Notice that adding more training instances “off the street” will not affect the decision boundary at all: it is fully determined (or “supported”) by the instances located on the edge of the street.</image:caption>
      <image:title>Linear SVM Classification</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_1.webp</image:loc>
      <image:caption>Linear SVM Classification. Notice that adding more training instances “off the street” will not affect the decision boundary at all: it is fully determined (or “supported”) by the instances located on the edge of the street. SVMs are sensitive to the feature scales, as you can see in: on the left plot, the vertical scale is much larger than the horizontal scale, so the widest possible street is close to horizontal.</image:caption>
      <image:title>Linear SVM Classification</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_2.webp</image:loc>
      <image:caption>Linear SVM Classification. SVMs are sensitive to the feature scales, as you can see in: on the left plot, the vertical scale is much larger than the horizontal scale, so the widest possible street is close to horizontal. Sensitivity to feature scales</image:caption>
      <image:title>Linear SVM Classification</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_3.webp</image:loc>
      <image:caption>Soft Margin Classification. Hard margin sensitivity to outliers. To avoid these issues it is preferable to use a more flexible model.</image:caption>
      <image:title>Soft Margin Classification</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_4.webp</image:loc>
      <image:caption>Soft Margin Classification. In Scikit-Learn’s SVM classes, you can control this balance using the C hyperparame‐ ter: a smaller C value leads to a wider street but more margin violations.shows the decision boundaries and margins of two soft margin SVM classifiers on a nonlinearly separable dataset. Fewer margin violations versus large margin</image:caption>
      <image:title>Soft Margin Classification</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_5.webp</image:loc>
      <image:caption>Soft Margin Classification. Fewer margin violations versus large margin. If your SVM model is overfitting, you can try regularizing it by reducing C</image:caption>
      <image:title>Soft Margin Classification</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_6.webp</image:loc>
      <image:caption>Soft Margin Classification. array([ 1.]). Unlike Logistic Regression classifiers, SVM classifiers do not out‐ put probabilities for each class</image:caption>
      <image:title>Soft Margin Classification</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_7.webp</image:loc>
      <image:caption>Soft Margin Classification. Alternatively, you could use the SVC class, using SVC(kernel=&quot;linear&quot;, C=1), but it is much slower, especially with large training sets, so it is not recommended. The LinearSVC class regularizes the bias term, so you should center the training set first by subtracting its mean.</image:caption>
      <image:title>Soft Margin Classification</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_8.webp</image:loc>
      <image:caption>Nonlinear SVM Classification. Although linear SVM classifiers are efficient and work surprisingly well in many cases, many datasets are not even close to being linearly separable. Adding features to make a dataset linearly separable</image:caption>
      <image:title>Nonlinear SVM Classification</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_9.webp</image:loc>
      <image:caption>Nonlinear SVM Classification. Linear SVM classifier using polynomial features</image:caption>
      <image:title>Nonlinear SVM Classification</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_10.webp</image:loc>
      <image:caption>Polynomial Kernel. reduce the polynomial degree. Conversely, if it is underfitting, you can try increasing it. The hyperparameter coef0 controls how much the model is influenced by high- degree polynomials versus low-degree polynomials. SVM classifiers with a polynomial kernel</image:caption>
      <image:title>Polynomial Kernel</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_11.webp</image:loc>
      <image:caption>Polynomial Kernel. SVM classifiers with a polynomial kernel. A common approach to find the right hyperparameter values is to use grid search (see Chapter 2 ).</image:caption>
      <image:title>Polynomial Kernel</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_12.webp</image:loc>
      <image:caption>Adding Similarity Features. Equation 5-1. Gaussian RBF</image:caption>
      <image:title>Adding Similarity Features</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_13.webp</image:loc>
      <image:caption>Adding Similarity Features. Equation 5-1. Gaussian RBF</image:caption>
      <image:title>Adding Similarity Features</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_14.webp</image:loc>
      <image:caption>Adding Similarity Features. Equation 5-1. Gaussian RBF</image:caption>
      <image:title>Adding Similarity Features</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_15.webp</image:loc>
      <image:caption>Adding Similarity Features</image:caption>
      <image:title>Adding Similarity Features</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_16.webp</image:loc>
      <image:caption>Adding Similarity Features</image:caption>
      <image:title>Adding Similarity Features</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_17.webp</image:loc>
      <image:caption>Adding Similarity Features. ϕγ , ℓ = exp − γ − ℓ 2</image:caption>
      <image:title>Adding Similarity Features</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_18.webp</image:loc>
      <image:caption>Adding Similarity Features. Similarity features using the Gaussian RBF. You may wonder how to select the landmarks.</image:caption>
      <image:title>Adding Similarity Features</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_19.webp</image:loc>
      <image:caption>Gaussian RBF Kernel. SVM classifiers using an RBF kernel. Other kernels exist but are used much more rarely.</image:caption>
      <image:title>Gaussian RBF Kernel</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_20.webp</image:loc>
      <image:caption>Gaussian RBF Kernel. Other kernels exist but are used much more rarely. With so many kernels to choose from, how can you decide which one to use?</image:caption>
      <image:title>Gaussian RBF Kernel</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_21.webp</image:loc>
      <image:caption>SVM Regression. Chapter 5: Support Vector Machines. Adding more training instances within the margin does not affect the model’s predic‐ tions; thus, the model is said to be ϵ-insensitive</image:caption>
      <image:title>SVM Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_22.webp</image:loc>
      <image:caption>SVM Regression. To tackle nonlinear regression tasks, you can use a kernelized SVM model. SVM regression using a 2nd-degree polynomial kernel</image:caption>
      <image:title>SVM Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_23.webp</image:loc>
      <image:caption>SVM Regression. svm_poly_reg = SVR(kernel=&quot;poly&quot;, degree=2, C=100, epsilon=0.1) svm_poly_reg.fit(X, y). SVMs can also be used for outlier detection; see Scikit-Learn’s doc‐ umentation for more details</image:caption>
      <image:title>SVM Regression</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_24.webp</image:loc>
      <image:caption>Decision Function and Predictions. shows the decision function that corresponds to the model on the right of: it is a two-dimensional plane since this dataset has two features (petal width and petal length). Decision function for the iris dataset</image:caption>
      <image:title>Decision Function and Predictions</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_25.webp</image:loc>
      <image:caption>Training Objective. The dashed lines represent the points where the decision function is equal to 1 or –1: they are parallel and at equal distance to the decision boundary, forming a margin around it.</image:caption>
      <image:title>Training Objective</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_26.webp</image:loc>
      <image:caption>Training Objective. The dashed lines represent the points where the decision function is equal to 1 or –1: they are parallel and at equal distance to the decision boundary, forming a margin around it. Consider the slope of the decision function: it is equal to the norm of the weight vec‐ tor, w .</image:caption>
      <image:title>Training Objective</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_27.webp</image:loc>
      <image:caption>Training Objective. A smaller weight vector results in a larger margin</image:caption>
      <image:title>Training Objective</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_28.webp</image:loc>
      <image:caption>Training Objective. A smaller weight vector results in a larger margin</image:caption>
      <image:title>Training Objective</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_29.webp</image:loc>
      <image:caption>Training Objective. A smaller weight vector results in a larger margin. So we want to minimize w to get a large margin.</image:caption>
      <image:title>Training Objective</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_30.webp</image:loc>
      <image:caption>Training Objective. subject to t T · i + b ≥ 1 for i = 1, 2, ⋯, m. i</image:caption>
      <image:title>Training Objective</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_31.webp</image:loc>
      <image:caption>Training Objective. 2</image:caption>
      <image:title>Training Objective</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_32.webp</image:loc>
      <image:caption>Training Objective. 2</image:caption>
      <image:title>Training Objective</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_33.webp</image:loc>
      <image:caption>Training Objective. 2. We are minimizing 1 w T · w , which is equal to 1 w 2, rather than</image:caption>
      <image:title>Training Objective</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_34.webp</image:loc>
      <image:caption>Training Objective. 2 2</image:caption>
      <image:title>Training Objective</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_35.webp</image:loc>
      <image:caption>Training Objective. 2 2. minimizing w . This is because it will give the same result (since the values of w and b that minimize a value also minimize half of</image:caption>
      <image:title>Training Objective</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_36.webp</image:loc>
      <image:caption>Training Objective. minimizing w . This is because it will give the same result (since the values of w and b that minimize a value also minimize half of</image:caption>
      <image:title>Training Objective</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_37.webp</image:loc>
      <image:caption>Training Objective. minimizing w . This is because it will give the same result (since the values of w and b that minimize a value also minimize half of. its square), but 1 w 2 has a nice and simple derivative (it is just</image:caption>
      <image:title>Training Objective</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_38.webp</image:loc>
      <image:caption>Training Objective. 2</image:caption>
      <image:title>Training Objective</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_39.webp</image:loc>
      <image:caption>Training Objective. 2. w ) while w is not differentiable at w = 0 . Optimization algo‐ rithms work much better on differentiable functions</image:caption>
      <image:title>Training Objective</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_40.webp</image:loc>
      <image:caption>Training Objective. subject to t T · i + b ≥ 1 − ζ i and ζ i ≥ 0 for i = 1, 2, ⋯, m. i</image:caption>
      <image:title>Training Objective</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_41.webp</image:loc>
      <image:caption>The Dual Problem. Equation 5-6. Dual form of the linear SVM objective</image:caption>
      <image:title>The Dual Problem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_42.webp</image:loc>
      <image:caption>The Dual Problem. Equation 5-6. Dual form of the linear SVM objective</image:caption>
      <image:title>The Dual Problem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_43.webp</image:loc>
      <image:caption>The Dual Problem. Equation 5-6. Dual form of the linear SVM objective</image:caption>
      <image:title>The Dual Problem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_44.webp</image:loc>
      <image:caption>The Dual Problem</image:caption>
      <image:title>The Dual Problem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_45.webp</image:loc>
      <image:caption>The Dual Problem</image:caption>
      <image:title>The Dual Problem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_46.webp</image:loc>
      <image:caption>The Dual Problem. minimize 1 ∑ m ∑ m α i α j t i t j i T · j − ∑ m α i</image:caption>
      <image:title>The Dual Problem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_47.webp</image:loc>
      <image:caption>The Dual Problem. = m α i t i i i = 1. ∑</image:caption>
      <image:title>The Dual Problem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_48.webp</image:loc>
      <image:caption>The Dual Problem. α i &gt; 0</image:caption>
      <image:title>The Dual Problem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_49.webp</image:loc>
      <image:caption>The Dual Problem. α i &gt; 0. ∑</image:caption>
      <image:title>The Dual Problem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_50.webp</image:loc>
      <image:caption>Kernelized SVM. x 2. x 1</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_51.webp</image:loc>
      <image:caption>Kernelized SVM. Equation 5-9. Kernel trick for a 2nd-degree polynomial mapping</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_52.webp</image:loc>
      <image:caption>Kernelized SVM. Equation 5-9. Kernel trick for a 2nd-degree polynomial mapping</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_53.webp</image:loc>
      <image:caption>Kernelized SVM. Equation 5-9. Kernel trick for a 2nd-degree polynomial mapping</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_54.webp</image:loc>
      <image:caption>Kernelized SVM. ϕ T · ϕ =</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_55.webp</image:loc>
      <image:caption>Kernelized SVM. 1. a</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_56.webp</image:loc>
      <image:caption>Kernelized SVM. 1. b</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_57.webp</image:loc>
      <image:caption>Kernelized SVM. How about that? The dot product of the transformed vectors is equal to the square of the dot product of the original vectors: ϕ ( a ) T · ϕ ( b ) = ( a T · b )2</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_58.webp</image:loc>
      <image:caption>Kernelized SVM. How about that? The dot product of the transformed vectors is equal to the square of the dot product of the original vectors: ϕ ( a ) T · ϕ ( b ) = ( a T · b )2</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_59.webp</image:loc>
      <image:caption>Kernelized SVM. How about that? The dot product of the transformed vectors is equal to the square of the dot product of the original vectors: ϕ ( a ) T · ϕ ( b ) = ( a T · b )2. Now here is the key insight: if you apply the transformation ϕ to all training instan‐ ces, then the dual problem (see Equation 5-6 ) will contain the dot product ϕ ( x (i) ) T · ϕ ( x (j) ).</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_60.webp</image:loc>
      <image:caption>Kernelized SVM. Equation 5-10. Common kernels</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_61.webp</image:loc>
      <image:caption>Kernelized SVM. Equation 5-10. Common kernels. Linear: K , = T ·</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_62.webp</image:loc>
      <image:caption>Kernelized SVM. Linear: K , = T ·</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_63.webp</image:loc>
      <image:caption>Kernelized SVM. Linear: K , = T ·</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_64.webp</image:loc>
      <image:caption>Kernelized SVM. Linear: K , = T ·</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_65.webp</image:loc>
      <image:caption>Kernelized SVM. Polynomial: K , = γT · + r d</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_66.webp</image:loc>
      <image:caption>Kernelized SVM. Polynomial: K , = γT · + r d</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_67.webp</image:loc>
      <image:caption>Kernelized SVM. Polynomial: K , = γT · + r d</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_68.webp</image:loc>
      <image:caption>Kernelized SVM. Polynomial: K , = γT · + r d</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_69.webp</image:loc>
      <image:caption>Kernelized SVM</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_70.webp</image:loc>
      <image:caption>Kernelized SVM</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_71.webp</image:loc>
      <image:caption>Kernelized SVM. Gaussian RBF: K , = exp − γ − 2</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_72.webp</image:loc>
      <image:caption>Kernelized SVM. Gaussian RBF: K , = exp − γ − 2</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_73.webp</image:loc>
      <image:caption>Kernelized SVM. Gaussian RBF: K , = exp − γ − 2. Sigmoid: K , = tanh γT · + r</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_74.webp</image:loc>
      <image:caption>Kernelized SVM. Equation 5-11. Making predictions with a kernelized SVM</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_75.webp</image:loc>
      <image:caption>Kernelized SVM. Equation 5-11. Making predictions with a kernelized SVM</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_76.webp</image:loc>
      <image:caption>Kernelized SVM. Equation 5-11. Making predictions with a kernelized SVM. b ϕ</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_77.webp</image:loc>
      <image:caption>Kernelized SVM. = T</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_78.webp</image:loc>
      <image:caption>Kernelized SVM. = T. · ϕ n</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_79.webp</image:loc>
      <image:caption>Kernelized SVM. + b = α. ∑</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_80.webp</image:loc>
      <image:caption>Kernelized SVM. T</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_81.webp</image:loc>
      <image:caption>Kernelized SVM. T. · ϕ n + b</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_82.webp</image:loc>
      <image:caption>Kernelized SVM. · ϕ n + b. i</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_83.webp</image:loc>
      <image:caption>Kernelized SVM. = m α i t</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_84.webp</image:loc>
      <image:caption>Kernelized SVM. = m α i t. i</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_85.webp</image:loc>
      <image:caption>Kernelized SVM. i = 1. ϕ i T · ϕ n + b</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_86.webp</image:loc>
      <image:caption>Kernelized SVM. ϕ i T · ϕ n + b</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_87.webp</image:loc>
      <image:caption>Kernelized SVM. ϕ i T · ϕ n + b</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_88.webp</image:loc>
      <image:caption>Kernelized SVM. ϕ i T · ϕ n + b. = m α i t i K i , n + b</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_89.webp</image:loc>
      <image:caption>Kernelized SVM. = m α i t i K i , n + b</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_90.webp</image:loc>
      <image:caption>Kernelized SVM. = m α i t i K i , n + b. ∑</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_91.webp</image:loc>
      <image:caption>Kernelized SVM. α i &gt; 0. Note that since α(i) ≠ 0 only for support vectors, making predictions involves comput‐ ing the dot product of the new input vector x (n) with only the support vectors, not all</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_92.webp</image:loc>
      <image:caption>Kernelized SVM. i</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_93.webp</image:loc>
      <image:caption>Kernelized SVM. i. m</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_94.webp</image:loc>
      <image:caption>Kernelized SVM. m</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_95.webp</image:loc>
      <image:caption>Kernelized SVM. m</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_96.webp</image:loc>
      <image:caption>Kernelized SVM. m</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_97.webp</image:loc>
      <image:caption>Kernelized SVM</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_98.webp</image:loc>
      <image:caption>Kernelized SVM</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_99.webp</image:loc>
      <image:caption>Kernelized SVM</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_100.webp</image:loc>
      <image:caption>Kernelized SVM</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_101.webp</image:loc>
      <image:caption>Kernelized SVM</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_102.webp</image:loc>
      <image:caption>Kernelized SVM. b = 1 ∑ m 1 − t i T · ϕ i = 1 ∑ 1 − t m α j t j ϕ j i</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_103.webp</image:loc>
      <image:caption>Kernelized SVM. b = 1 ∑ m 1 − t i T · ϕ i = 1 ∑ 1 − t m α j t j ϕ j i</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_104.webp</image:loc>
      <image:caption>Kernelized SVM. b = 1 ∑ m 1 − t i T · ϕ i = 1 ∑ 1 − t m α j t j ϕ j i. ∑ · ϕ</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_105.webp</image:loc>
      <image:caption>Kernelized SVM. α i &gt; 0. ns i = 1</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_106.webp</image:loc>
      <image:caption>Kernelized SVM. α i &gt; 0. j = 1</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_107.webp</image:loc>
      <image:caption>Kernelized SVM. α i &gt; 0</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_108.webp</image:loc>
      <image:caption>Kernelized SVM. α i &gt; 0</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_109.webp</image:loc>
      <image:caption>Kernelized SVM. α i &gt; 0</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_110.webp</image:loc>
      <image:caption>Kernelized SVM</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_111.webp</image:loc>
      <image:caption>Kernelized SVM. 1 − t i m α j t j K i , j j = 1</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_112.webp</image:loc>
      <image:caption>Kernelized SVM. 1 − t i m α j t j K i , j j = 1</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_113.webp</image:loc>
      <image:caption>Kernelized SVM. 1 − t i m α j t j K i , j j = 1</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_114.webp</image:loc>
      <image:caption>Kernelized SVM. 1 − t i m α j t j K i , j j = 1. α j &gt; 0</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_115.webp</image:loc>
      <image:caption>Kernelized SVM. α j &gt; 0. ∑</image:caption>
      <image:title>Kernelized SVM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_116.webp</image:loc>
      <image:caption>Online SVMs. Equation 5-13. Linear SVM classifier cost function</image:caption>
      <image:title>Online SVMs</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_117.webp</image:loc>
      <image:caption>Online SVMs. Equation 5-13. Linear SVM classifier cost function</image:caption>
      <image:title>Online SVMs</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_118.webp</image:loc>
      <image:caption>Online SVMs. Equation 5-13. Linear SVM classifier cost function</image:caption>
      <image:title>Online SVMs</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_119.webp</image:loc>
      <image:caption>Online SVMs. J , b = 1 T · + C ∑ m max 0, 1 − t T · i + b</image:caption>
      <image:title>Online SVMs</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_120.webp</image:loc>
      <image:caption>Online SVMs. J , b = 1 T · + C ∑ m max 0, 1 − t T · i + b. i</image:caption>
      <image:title>Online SVMs</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter5_121.webp</image:loc>
      <image:caption>Online SVMs. It is also possible to implement online kernelized SVMs—for example, using “Incre‐ mental and Decremental SVM Learning” 7 or “Fast Kernel Classifiers with Online and Active Learning.” 8 However, these are implemented in Matlab and C++.</image:caption>
      <image:title>Online SVMs</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/ml/mlchapter6</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter6_0.webp</image:loc>
      <image:caption>Training and Visualizing a Decision Tree. Your first decision tree looks like. Iris Decision Tree</image:caption>
      <image:title>Training and Visualizing a Decision Tree</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter6_1.webp</image:loc>
      <image:caption>Making Predictions. 2.45 cm. One of the many qualities of Decision Trees is that they require very little data preparation. In particular, they don’t require feature scaling or centering at all</image:caption>
      <image:title>Making Predictions</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter6_2.webp</image:loc>
      <image:caption>Making Predictions. p i , k is the ratio of class k instances among the training instances in the i th node. Scikit-Learn uses the CART algorithm, which produces only binary trees : nonleaf nodes always have two children (i.e., questions only have yes/no answers).</image:caption>
      <image:title>Making Predictions</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter6_3.webp</image:loc>
      <image:caption>Making Predictions. shows this Decision Tree’s decision boundaries. Decision Tree decision boundaries</image:caption>
      <image:title>Making Predictions</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter6_4.webp</image:loc>
      <image:caption>The CART Training Algorithm. ples_leaf, min_weight_fraction_leaf, and max_leaf_nodes). As you can see, the CART algorithm is a greedy algorithm : it greed‐ ily searches for an optimum split at the top level, then repeats the process at each level.</image:caption>
      <image:title>The CART Training Algorithm</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter6_5.webp</image:loc>
      <image:caption>Gini Impurity or Entropy?. pi , k ≠ 0</image:caption>
      <image:title>Gini Impurity or Entropy?</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter6_6.webp</image:loc>
      <image:caption>Gini Impurity or Entropy?. pi , k ≠ 0. pi , k log pi , k</image:caption>
      <image:title>Gini Impurity or Entropy?</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter6_7.webp</image:loc>
      <image:caption>Regularization Hyperparameters. ples a node must have before it can be split), min_samples_leaf (the minimum num‐ ber of samples a leaf node must have), min_weight_fraction_leaf (same as min_samples_leaf but expressed as a fraction of the total number of weighted instances), max_leaf_nodes (maximum number of leaf nodes), and max_features (maximum number of features that are evaluated for splitting at each node). Other algorithms work by first training the Decision Tree without restrictions, then pruning (deleting) unnecessary nodes.</image:caption>
      <image:title>Regularization Hyperparameters</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter6_8.webp</image:loc>
      <image:caption>Regularization Hyperparameters. shows two Decision Trees trained on the moons dataset (introduced in Chapter 5 ). Regularization using min_samples_leaf</image:caption>
      <image:title>Regularization Hyperparameters</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter6_9.webp</image:loc>
      <image:caption>Regularization Hyperparameters. The resulting tree is represented on. A Decision Tree for regression</image:caption>
      <image:title>Regularization Hyperparameters</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter6_10.webp</image:loc>
      <image:caption>Regularization Hyperparameters. Predictions of two Decision Tree regression models. The CART algorithm works mostly the same way as earlier, except that instead of try‐ ing to split the training set in a way that minimizes impurity, it now tries to split the training set in a way that minimizes the MSE.</image:caption>
      <image:title>Regularization Hyperparameters</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter6_11.webp</image:loc>
      <image:caption>Regularization Hyperparameters. J k , t. = m left MSE</image:caption>
      <image:title>Regularization Hyperparameters</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter6_12.webp</image:loc>
      <image:caption>Regularization Hyperparameters. Just like for classification tasks, Decision Trees are prone to overfitting when dealing with regression tasks. Regularizing a Decision Tree regressor</image:caption>
      <image:title>Regularization Hyperparameters</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter6_13.webp</image:loc>
      <image:caption>Regularization Hyperparameters. Hopefully by now you are convinced that Decision Trees have a lot going for them: they are simple to understand and interpret, easy to use, versatile, and powerful. Sensitivity to training set rotation</image:caption>
      <image:title>Regularization Hyperparameters</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter6_14.webp</image:loc>
      <image:caption>Regularization Hyperparameters. Instability. Sensitivity to training set details</image:caption>
      <image:title>Regularization Hyperparameters</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/ml/mlchapter7</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_0.webp</image:loc>
      <image:caption>Voting Classifiers. Training diverse classifiers. A very simple way to create an even better classifier is to aggregate the predictions of each classifier and predict the class that gets the most votes. This majority-vote classi‐ fier is called a hard voting classifier (see</image:caption>
      <image:title>Voting Classifiers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_1.webp</image:loc>
      <image:caption>Voting Classifiers. A very simple way to create an even better classifier is to aggregate the predictions of each classifier and predict the class that gets the most votes. This majority-vote classi‐ fier is called a hard voting classifier (see. Hard voting classifier predictions</image:caption>
      <image:title>Voting Classifiers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_2.webp</image:loc>
      <image:caption>Voting Classifiers. How is this possible?. The law of large numbers</image:caption>
      <image:title>Voting Classifiers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_3.webp</image:loc>
      <image:caption>Voting Classifiers. Similarly, suppose you build an ensemble containing 1,000 classifiers that are individ‐ ually correct only 51% of the time (barely better than random guessing). Ensemble methods work best when the predictors are as independ‐ ent from one another as possible.</image:caption>
      <image:title>Voting Classifiers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_4.webp</image:loc>
      <image:caption>Bagging and Pasting. In other words, both bagging and pasting allow training instances to be sampled sev‐ eral times across multiple predictors, but only bagging allows training instances to be sampled several times for the same predictor. Pasting/bagging training set sampling and training</image:caption>
      <image:title>Bagging and Pasting</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_5.webp</image:loc>
      <image:caption>Bagging and Pasting in Scikit-Learn. bag_clf.fit(X_train, y_train) y_pred = bag_clf.predict(X_test). The BaggingClassifier automatically performs soft voting instead of hard voting if the base classifier can estimate class proba‐ bilities (i.e., if it has a predict_proba() method), which is the case with Decision Trees classifiers</image:caption>
      <image:title>Bagging and Pasting in Scikit-Learn</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_6.webp</image:loc>
      <image:caption>Bagging and Pasting in Scikit-Learn. A single Decision Tree versus a bagging ensemble of 500 trees. Bootstrapping introduces a bit more diversity in the subsets that each predictor is trained on, so bagging ends up with a slightly higher bias than pasting, but this also means that predictors end up being less correlated so the ensemble’s variance is reduced.</image:caption>
      <image:title>Bagging and Pasting in Scikit-Learn</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_7.webp</image:loc>
      <image:caption>Random Forests. You can create an Extra-Trees classifier using Scikit-Learn’s ExtraTreesClassifier class. Its API is identical to the RandomForestClassifier class. Similarly, the Extra TreesRegressor class has the same API as the RandomForestRegressor class. It is hard to tell in advance whether a RandomForestClassifier will perform better or worse than an ExtraTreesClassifier.</image:caption>
      <image:title>Random Forests</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_8.webp</image:loc>
      <image:caption>Feature Importance. Similarly, if you train a Random Forest classifier on the MNIST dataset (introduced in Chapter 3 ) and plot each pixel’s importance, you get the image represented in. MNIST pixel importance (according to a Random Forest classifier)</image:caption>
      <image:title>Feature Importance</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_9.webp</image:loc>
      <image:caption>AdaBoost. For example, to build an AdaBoost classifier, a first base classifier (such as a Decision Tree) is trained and used to make predictions on the training set. AdaBoost sequential training with instance weight updates</image:caption>
      <image:title>AdaBoost</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_10.webp</image:loc>
      <image:caption>AdaBoost. parameters to minimize a cost function, AdaBoost adds predictors to the ensemble, gradually making it better. Decision boundaries of consecutive predictors</image:caption>
      <image:title>AdaBoost</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_11.webp</image:loc>
      <image:caption>AdaBoost. Once all predictors are trained, the ensemble makes predictions very much like bag‐ ging or pasting, except that predictors have different weights depending on their overall accuracy on the weighted training set. There is one important drawback to this sequential learning techni‐ que: it cannot be parallelized (or only partially), since each predic‐ tor can only be trained after the previous predictor has been trained and evaluated.</image:caption>
      <image:title>AdaBoost</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_12.webp</image:loc>
      <image:caption>AdaBoost. w i if yj i = y i. i</image:caption>
      <image:title>AdaBoost</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_13.webp</image:loc>
      <image:caption>AdaBoost. i. w</image:caption>
      <image:title>AdaBoost</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_14.webp</image:loc>
      <image:caption>AdaBoost. w</image:caption>
      <image:title>AdaBoost</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_15.webp</image:loc>
      <image:caption>AdaBoost. w. w i exp αj if yj i ≠ y i</image:caption>
      <image:title>AdaBoost</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_16.webp</image:loc>
      <image:caption>AdaBoost. w i exp αj if yj i ≠ y i. Then all the instance weights are normalized (i.e., divided by ∑ m w i )</image:caption>
      <image:title>AdaBoost</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_17.webp</image:loc>
      <image:caption>AdaBoost. ∑</image:caption>
      <image:title>AdaBoost</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_18.webp</image:loc>
      <image:caption>AdaBoost. ∑. y = argmax</image:caption>
      <image:title>AdaBoost</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_19.webp</image:loc>
      <image:caption>AdaBoost. ada_clf.fit(X_train, y_train). If your AdaBoost ensemble is overfitting the training set, you can try reducing the number of estimators or more strongly regulariz‐ ing the base estimator</image:caption>
      <image:title>AdaBoost</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_20.webp</image:loc>
      <image:caption>Gradient Boosting. Chapter 7: Ensemble Learning and Random Forests. The learning_rate hyperparameter scales the contribution of each tree.</image:caption>
      <image:title>Gradient Boosting</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_21.webp</image:loc>
      <image:caption>Gradient Boosting. GBRT ensembles with not enough predictors (left) and too many (right). In order to find the optimal number of trees, you can use early stopping (see Chap‐ ter 4 ).</image:caption>
      <image:title>Gradient Boosting</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_22.webp</image:loc>
      <image:caption>Gradient Boosting. Tuning the number of trees using early stopping. It is also possible to implement early stopping by actually stopping training early (instead of training a large number of trees first and then looking back to find the optimal number).</image:caption>
      <image:title>Gradient Boosting</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_23.webp</image:loc>
      <image:caption>Gradient Boosting. Boosting. It is possible to use Gradient Boosting with other cost functions. This is controlled by the loss hyperparameter (see Scikit-Learn’s documentation for more details)</image:caption>
      <image:title>Gradient Boosting</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_24.webp</image:loc>
      <image:caption>Stacking. The last Ensemble method we will discuss in this chapter is called stacking (short for stacked generalization ). Aggregating predictions using a blending predictor</image:caption>
      <image:title>Stacking</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_25.webp</image:loc>
      <image:caption>Stacking. Training the first layer. Next, the first layer predictors are used to make predictions on the second (held-out) set (see.</image:caption>
      <image:title>Stacking</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_26.webp</image:loc>
      <image:caption>Stacking. Next, the first layer predictors are used to make predictions on the second (held-out) set (see. Training the blender</image:caption>
      <image:title>Stacking</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter7_27.webp</image:loc>
      <image:caption>Stacking. It is actually possible to train several different blenders this way (e.g., one using Lin‐ ear Regression, another using Random Forest Regression, and so on): we get a whole layer of blenders. Predictions in a multilayer stacking ensemble</image:caption>
      <image:title>Stacking</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/ml/mlchapter8</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_0.webp</image:loc>
      <image:caption>Dimensionality Reduction. Fortunately, in real-world problems, it is often possible to reduce the number of fea‐ tures considerably, turning an intractable problem into a tractable one. Reducing dimensionality does lose some information (just like compressing an image to JPEG can degrade its quality), so even though it will speed up training, it may also make your system per‐ form slightly worse.</image:caption>
      <image:title>Dimensionality Reduction</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_1.webp</image:loc>
      <image:caption>The Curse of Dimensionality. We are so used to living in three dimensions 1 that our intuition fails us when we try to imagine a high-dimensional space. Point, segment, square, cube, and tesseract (0D to 4D hypercubes) 2</image:caption>
      <image:title>The Curse of Dimensionality</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_2.webp</image:loc>
      <image:caption>Projection. A 3D dataset lying close to a 2D subspace. Notice that all training instances lie close to a plane: this is a lower-dimensional (2D) subspace of the high-dimensional (3D) space.</image:caption>
      <image:title>Projection</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_3.webp</image:loc>
      <image:caption>Projection. Notice that all training instances lie close to a plane: this is a lower-dimensional (2D) subspace of the high-dimensional (3D) space. The new 2D dataset after projection</image:caption>
      <image:title>Projection</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_4.webp</image:loc>
      <image:caption>Projection. However, projection is not always the best approach to dimensionality reduction. In many cases the subspace may twist and turn, such as in the famous Swiss roll toy data‐ set represented in. Swiss roll dataset</image:caption>
      <image:title>Projection</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_5.webp</image:loc>
      <image:caption>Projection. Simply projecting onto a plane (e.g., by dropping x 3 ) would squash different layers of the Swiss roll together, as shown on the left of. However, what you really want is to unroll the Swiss roll to obtain the 2D dataset on the right of. Squashing by projecting onto a plane (left) versus unrolling the Swiss roll (right)</image:caption>
      <image:title>Projection</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_6.webp</image:loc>
      <image:caption>Manifold Learning. The decision boundary may not always be simpler with lower dimensions</image:caption>
      <image:title>Manifold Learning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_7.webp</image:loc>
      <image:caption>Preserving the Variance. Selecting the subspace onto which to project. It seems reasonable to select the axis that preserves the maximum amount of var‐ iance, as it will most likely lose less information than the other projections.</image:caption>
      <image:title>Preserving the Variance</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_8.webp</image:loc>
      <image:caption>Principal Components. Chapter 8: Dimensionality Reduction. The direction of the principal components is not stable: if you per‐ turb the training set slightly and run PCA again, some of the new PCs may point in the opposite direction of the original PCs.</image:caption>
      <image:title>Principal Components</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_9.webp</image:loc>
      <image:caption>Principal Components. c2 = V.T[:, 1]. PCA assumes that the dataset is centered around the origin.</image:caption>
      <image:title>Principal Components</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_10.webp</image:loc>
      <image:caption>Choosing the Right Number of Dimensions. Yet another option is to plot the explained variance as a function of the number of dimensions (simply plot cumsum; see. Explained variance as a function of the number of dimensions</image:caption>
      <image:title>Choosing the Right Number of Dimensions</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_11.webp</image:loc>
      <image:caption>Choosing the Right Number of Dimensions. X_mnist_recovered = pca.inverse_transform(X_mnist_reduced). MNIST compression preserving 95% of the variance</image:caption>
      <image:title>Choosing the Right Number of Dimensions</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_12.webp</image:loc>
      <image:caption>Kernel PCA. Swiss roll reduced to 2D using kPCA with various kernels</image:caption>
      <image:title>Kernel PCA</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_13.webp</image:loc>
      <image:caption>Selecting a Kernel and Tuning Hyperparameters. Another approach, this time entirely unsupervised, is to select the kernel and hyper‐ parameters that yield the lowest reconstruction error. Kernel PCA and the reconstruction pre-image error</image:caption>
      <image:title>Selecting a Kernel and Tuning Hyperparameters</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_14.webp</image:loc>
      <image:caption>Selecting a Kernel and Tuning Hyperparameters. X_preimage = rbf_pca.inverse_transform(X_reduced). By default, fit_inverse_transform=False and KernelPCA has no inverse_transform() method. This method only gets created when you set fit_inverse_transform=True</image:caption>
      <image:title>Selecting a Kernel and Tuning Hyperparameters</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_15.webp</image:loc>
      <image:caption>LLE. lle = LocallyLinearEmbedding(n_components=2, n_neighbors=10) X_reduced = lle.fit_transform(X). Unrolled Swiss roll using LLE</image:caption>
      <image:title>LLE</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_16.webp</image:loc>
      <image:caption>LLE. Equation 8-4. LLE step 1: linearly modeling local relationships. = argmin i</image:caption>
      <image:title>LLE</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_17.webp</image:loc>
      <image:caption>LLE. 2</image:caption>
      <image:title>LLE</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_18.webp</image:loc>
      <image:caption>LLE. 2</image:caption>
      <image:title>LLE</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_19.webp</image:loc>
      <image:caption>LLE. 2. – j ∑= 1 wi , j</image:caption>
      <image:title>LLE</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_20.webp</image:loc>
      <image:caption>LLE. m</image:caption>
      <image:title>LLE</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_21.webp</image:loc>
      <image:caption>LLE. m. wi , j = 0 if j is not one of the k c.n. of i j ∑= 1 wi , j = 1 for i = 1, 2, ⋯, m</image:caption>
      <image:title>LLE</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_22.webp</image:loc>
      <image:caption>LLE. After this step, the weight matrix (containing the weights wi , j ) encodes the local linear relationships between the training instances.</image:caption>
      <image:title>LLE</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_23.webp</image:loc>
      <image:caption>LLE. After this step, the weight matrix (containing the weights wi , j ) encodes the local linear relationships between the training instances. space, then we want the squared distance between z (i) and ∑ m w j to be as small</image:caption>
      <image:title>LLE</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_24.webp</image:loc>
      <image:caption>LLE. Equation 8-5. LLE step 2: reducing dimensionality while preserving relationships. = argmin i</image:caption>
      <image:title>LLE</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_25.webp</image:loc>
      <image:caption>LLE. 2</image:caption>
      <image:title>LLE</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_26.webp</image:loc>
      <image:caption>LLE. 2</image:caption>
      <image:title>LLE</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_27.webp</image:loc>
      <image:caption>LLE. 2. – j ∑= 1 wi , j</image:caption>
      <image:title>LLE</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter8_28.webp</image:loc>
      <image:caption>Other Dimensionality Reduction Techniques. Linear Discriminant Analysis (LDA) is actually a classification algorithm, but dur‐ ing training it learns the most discriminative axes between the classes, and these axes can then be used to define a hyperplane onto which to project the data. Reducing the Swiss roll to 2D using various techniques</image:caption>
      <image:title>Other Dimensionality Reduction Techniques</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/ml/mlchapter9</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_0.webp</image:loc>
      <image:caption>Up and Running with TensorFlow. TensorFlow is a powerful open source software library for numerical computation, particularly well suited and fine-tuned for large-scale Machine Learning. A simple computation graph</image:caption>
      <image:title>Up and Running with TensorFlow</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_1.webp</image:loc>
      <image:caption>Up and Running with TensorFlow. developed by the Google Brain team and it powers many of Google’s large-scale serv‐ ices, such as Google Cloud Speech, Google Photos, and Google Search. Parallel computation on multiple CPUs/GPUs/servers</image:caption>
      <image:title>Up and Running with TensorFlow</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_2.webp</image:loc>
      <image:caption>Installation. $ pip3 install --upgrade tensorflow. For GPU support, you need to install tensorflow-gpu instead of</image:caption>
      <image:title>Installation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_3.webp</image:loc>
      <image:caption>Managing Graphs. &gt;&gt;&gt; x2.graph is tf.get_default_graph() False. In Jupyter (or in a Python shell), it is common to run the same commands more than once while you are experimenting.</image:caption>
      <image:title>Managing Graphs</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_4.webp</image:loc>
      <image:caption>Lifecycle of a Node Value. print (z_val) # 15. In single-process TensorFlow, multiple sessions do not share any state, even if they reuse the same graph (each session would have its own copy of every variable).</image:caption>
      <image:title>Lifecycle of a Node Value</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_5.webp</image:loc>
      <image:caption>Implementing Gradient Descent. Let’s try using Batch Gradient Descent (introduced in Chapter 4 ) instead of the Nor‐ mal Equation. When using Gradient Descent, remember that it is important to first normalize the input feature vectors, or else training may be much slower.</image:caption>
      <image:title>Implementing Gradient Descent</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_6.webp</image:loc>
      <image:caption>Feeding Data to the Training Algorithm. [ 12. 13. 14.]]. You can actually feed the output of any operations, not just place‐ holders. In this case TensorFlow does not try to evaluate these operations; it uses the values you feed it</image:caption>
      <image:title>Feeding Data to the Training Algorithm</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_7.webp</image:loc>
      <image:caption>Feeding Data to the Training Algorithm. best_theta = theta.eval(). We don’t need to pass the value of X and y when evaluating theta</image:caption>
      <image:title>Feeding Data to the Training Algorithm</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_8.webp</image:loc>
      <image:caption>Visualizing the Graph and Training Curves Using TensorBoard. sess.run(training_op, feed_dict={X: X_batch, y: y_batch}) [..]. Avoid logging training stats at every single training step, as this would significantly slow down training</image:caption>
      <image:title>Visualizing the Graph and Training Curves Using TensorBoard</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_9.webp</image:loc>
      <image:caption>Visualizing the Graph and Training Curves Using TensorBoard. Next open a browser and go to http://0.0.0.0:6006/ (or http://localhost:6006/ ). Visualizing training stats using TensorBoard</image:caption>
      <image:title>Visualizing the Graph and Training Curves Using TensorBoard</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_10.webp</image:loc>
      <image:caption>Visualizing the Graph and Training Curves Using TensorBoard. Visualizing the graph using TensorBoard</image:caption>
      <image:title>Visualizing the Graph and Training Curves Using TensorBoard</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_11.webp</image:loc>
      <image:caption>Visualizing the Graph and Training Curves Using TensorBoard. Visualizing the graph using TensorBoard. If you want to take a peek at the graph directly within Jupyter, you can use the show_graph() function available in the notebook for this chapter.</image:caption>
      <image:title>Visualizing the Graph and Training Curves Using TensorBoard</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_12.webp</image:loc>
      <image:caption>Name Scopes. A collapsed namescope in TensorBoard</image:caption>
      <image:title>Name Scopes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_13.webp</image:loc>
      <image:caption>Modularity. Equation 9-1. Rectified linear unit</image:caption>
      <image:title>Modularity</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_14.webp</image:loc>
      <image:caption>Modularity. Equation 9-1. Rectified linear unit</image:caption>
      <image:title>Modularity</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_15.webp</image:loc>
      <image:caption>Modularity. Equation 9-1. Rectified linear unit</image:caption>
      <image:title>Modularity</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_16.webp</image:loc>
      <image:caption>Modularity. h , b = max · + b , 0</image:caption>
      <image:title>Modularity</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_17.webp</image:loc>
      <image:caption>Modularity. Note that when you create a node, TensorFlow checks whether its name already exists, and if it does it appends an underscore followed by an index to make the name unique. Collapsed node series</image:caption>
      <image:title>Modularity</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_18.webp</image:loc>
      <image:caption>Modularity. with tf.name_scope(&quot;relu&quot;): [..]. A clearer graph using name-scoped units</image:caption>
      <image:title>Modularity</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_19.webp</image:loc>
      <image:caption>Sharing Variables. threshold = tf.get_variable(&quot;threshold&quot;). Once reuse is set to True, it cannot be set back to False within the block.</image:caption>
      <image:title>Sharing Variables</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_20.webp</image:loc>
      <image:caption>Sharing Variables. This code first defines the relu() function, then creates the relu/threshold variable (as a scalar that will later be initialized to 0.0) and builds five ReLUs by calling the relu() function. Variables created using get_variable() are always named using the name of their variable_scope as a prefix (e.g., &quot;relu/thres hold&quot;), but for all other nodes (including variables created with tf.Variable()) the variable scope acts like a new name scope.</image:caption>
      <image:title>Sharing Variables</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_21.webp</image:loc>
      <image:caption>Sharing Variables. Variables created using get_variable() are always named using the name of their variable_scope as a prefix (e.g., &quot;relu/thres hold&quot;), but for all other nodes (including variables created with tf.Variable()) the variable scope acts like a new name scope. Five ReLUs sharing the threshold variable</image:caption>
      <image:title>Sharing Variables</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/ml/images/mlchapter9_22.webp</image:loc>
      <image:caption>Sharing Variables. The resulting graph is slightly different than before, since the shared variable lives within the first ReLU (see. Five ReLUs sharing the threshold variable</image:caption>
      <image:title>Sharing Variables</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/docker/chapter1</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter1_0.webp</image:loc>
      <image:caption>What about Kubernetes. The Kubernetes Book . This is the ultimate book for mastering Kubernetes. I update both books annually to ensure they’re up-to-date with the latest and greatest developments in the cloud native ecosystem</image:caption>
      <image:title>What about Kubernetes</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/docker/chapter10</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter10_0.webp</image:loc>
      <image:caption>Docker Model Runner Architecture. 1 shows the high-level architecture and major components. 1 - Docker Model Runner Architecture</image:caption>
      <image:title>Docker Model Runner Architecture</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter10_1.webp</image:loc>
      <image:caption>Pull models from Docker Hub. architecture, training cut-off date, model variants, and even benchmark info. However, benchmark info is from the original model publisher, and you should always perform your own testing to see how well a model performs for your specific requirements. 2 - Model info card</image:caption>
      <image:title>Pull models from Docker Hub</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter10_2.webp</image:loc>
      <image:caption>Test your model. Open the Docker Desktop UI and click the Models tab in the left navigation bar. Click the model you want to test to open a chat session and then ask it the same questions. 3 - Docker Desktop’s Model Chat Window</image:caption>
      <image:title>Test your model</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter10_3.webp</image:loc>
      <image:caption>Use Docker Model Runner with Compose. and streams responses to the frontend. 4 - Chatbot architecture</image:caption>
      <image:title>Use Docker Model Runner with Compose</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter10_4.webp</image:loc>
      <image:caption>Use Docker Model Runner with Compose. Open your browser to http://localhost:3000 and ask your chatbot some questions. 5 - Working chatbot</image:caption>
      <image:title>Use Docker Model Runner with Compose</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter10_5.webp</image:loc>
      <image:caption>Connect to Open WebUI and use it. Once you’ve created your account, you’ll be automatically logged in and will see the Open WebUI interface as shown in.6. 6 - Open WebUI interface</image:caption>
      <image:title>Connect to Open WebUI and use it</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter10_6.webp</image:loc>
      <image:caption>Connect to Open WebUI and use it. 7 shows a very brief conversation asking how far away the moon is and then the sun. 7 - Conversational history</image:caption>
      <image:title>Connect to Open WebUI and use it</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/docker/chapter11</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter11_0.webp</image:loc>
      <image:caption>11: Docker and Wasm. We built the first wave on virtual machines (VMs), the second on containers, and we’re building the third on Wasm. Each wave drives smaller, faster, and more secure workloads, and all three are working together to drive the future of cloud computing. In this chapter, you’ll write a simple Wasm application and use Docker to containerize and run it in a container. The goal is to introduce you to Wasm and show you how easy it is to work with Docker and Wasm together</image:caption>
      <image:title>11: Docker and Wasm</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter11_1.webp</image:loc>
      <image:caption>Configure Docker Desktop for Wasm. 2 shows some of the settings. 2 - Docker Desktop Wasm settings</image:caption>
      <image:title>Configure Docker Desktop for Wasm</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter11_2.webp</image:loc>
      <image:caption>Write a Wasm app. Point your browser to http://127.0.0.1:3000/hello and make sure the app works. 3 - Wasm app running locally</image:caption>
      <image:title>Write a Wasm app</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter11_3.webp</image:loc>
      <image:caption>Containerize a Wasm app. If you look at Docker Hub, you can see it’s recognized it as a wasi/wasm image. You’ll also see there’s no vulnerability analysis data. This is because image scanning tools can’t analyze Wasm images yet. 4 - Wasm image on Docker Hub</image:caption>
      <image:title>Containerize a Wasm app</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter11_4.webp</image:loc>
      <image:caption>Run a Wasm container. Connect your browser to http://localhost:5556/hello to see the app. 5 - Wasm app running in container</image:caption>
      <image:title>Run a Wasm container</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/docker/chapter2</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter2_0.webp</image:loc>
      <image:caption>The Docker technology. 1 shows the high-level architecture. The client and engine can be on the same host or connected over the network. 1 - Docker client and engine</image:caption>
      <image:title>The Docker technology</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter2_1.webp</image:loc>
      <image:caption>The Docker technology. the engine. 2 - Docker CLI and daemon hiding complexity</image:caption>
      <image:title>The Docker technology</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/docker/chapter3</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter3_0.webp</image:loc>
      <image:caption>Installing Docker Desktop on Mac. 1 shows the high-level architecture for Docker Desktop on Mac. 1</image:caption>
      <image:title>Installing Docker Desktop on Mac</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/docker/chapter4</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter4_0.webp</image:loc>
      <image:caption>Run the app as a container. You will see the following web page. 1</image:caption>
      <image:title>Run the app as a container</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/docker/chapter5</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter5_0.webp</image:loc>
      <image:caption>Docker Engine – The TLDR. 1 shows the components of the Docker Engine that create and run containers. Other components exist, but this simplified diagram focuses on the components that start and run containers. 1</image:caption>
      <image:title>Docker Engine – The TLDR</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter5_1.webp</image:loc>
      <image:caption>Breaking up the monolithic Docker daemon. 2 shows another view of the Docker Engine components that are used to run containers and lists the primary responsibilities of each component. 2 - Engine components and responsibilities</image:caption>
      <image:title>Breaking up the monolithic Docker daemon</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter5_2.webp</image:loc>
      <image:caption>Starting a new container (example). 3 summarizes the process. 3</image:caption>
      <image:title>Starting a new container (example)</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/docker/chapter6</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter6_0.webp</image:loc>
      <image:caption>Intro to images. structs, whereas containers are run-time constructs.1 shows the build and run nature of each and that you can start multiple containers from a single image. 1</image:caption>
      <image:title>Intro to images</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter6_1.webp</image:loc>
      <image:caption>Image registries. 2 shows the central nature of registries in the build &gt; share &gt; run pipeline. 2</image:caption>
      <image:title>Image registries</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter6_2.webp</image:loc>
      <image:caption>Image registries. Image registries contain one or more image repositories , and image repositories contain one or more images.3 shows an image registry with three repositories, each with one or more images. 3 - Registry architecture</image:caption>
      <image:title>Image registries</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter6_3.webp</image:loc>
      <image:caption>Official repositories. 4 shows the official Alpine and NGINX repositories on Docker Hub. Both have the green Docker Official Image badge and have over a billion pulls each. Also, notice how both are available for a wide range of CPU architectures. 4 - Official repos on Docker Hub</image:caption>
      <image:title>Official repositories</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter6_4.webp</image:loc>
      <image:caption>Image naming and tagging. including the registry name, user/organization name, repository name, and tag. Docker automatically populates the registry and tag values if you don’t specify them. 5 - Fully qualified image name</image:caption>
      <image:title>Image naming and tagging</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter6_5.webp</image:loc>
      <image:caption>Images and layers. 6 shows an image with four layers. Docker takes care of stacking them and representing them as a single unified image. 6 - Image and stacked layers</image:caption>
      <image:title>Images and layers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter6_6.webp</image:loc>
      <image:caption>Images and layers. Each line ending with Pull complete represents a layer that Docker pulled. This image has five layers and is shown in.7 with layer IDs. 7 - Image layers and IDs</image:caption>
      <image:title>Images and layers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter6_7.webp</image:loc>
      <image:caption>Base layers. 24:04 image. 8</image:caption>
      <image:title>Base layers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter6_8.webp</image:loc>
      <image:caption>Base layers. It also shows that the layers are stored as independent objects, and the image is just metadata identifying the required layers and explaining how to stack them. 9</image:caption>
      <image:title>Base layers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter6_9.webp</image:loc>
      <image:caption>Base layers. is an updated version of File 5 directly below (inline). In this situation, the file in the higher layer obscures the file directly below it. This means you update files and make other changes to images by adding new layers containing the changes. 10 - Stacking layers</image:caption>
      <image:title>Base layers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter6_10.webp</image:loc>
      <image:caption>Base layers. — all three layers stacked and merged into a single unified view. 11 - Unified view of multi-layer image</image:caption>
      <image:title>Base layers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter6_11.webp</image:loc>
      <image:caption>Sharing image layers. &lt;Snip&gt;. 12 - Two images sharing a layer</image:caption>
      <image:title>Sharing image layers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter6_12.webp</image:loc>
      <image:caption>Multi-architecture images. 13 shows how manifest lists and manifests are related. 13 - Manifest lists and manifests</image:caption>
      <image:title>Multi-architecture images</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter6_13.webp</image:loc>
      <image:caption>Vulnerability scanning with Docker Scout. SBOM of image already cached, 66 packages indexed. Detected 1 vulnerable package with 2 vulnerabilities ## Overview</image:caption>
      <image:title>Vulnerability scanning with Docker Scout</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter6_14.webp</image:loc>
      <image:caption>Vulnerability scanning with Docker Scout. pkg:apk/alpine/ expat@2.5.0-r2?os_name=alpine&amp;os_version=3.19. HIGH CVE-2023-52425</image:caption>
      <image:title>Vulnerability scanning with Docker Scout</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter6_15.webp</image:loc>
      <image:caption>Vulnerability scanning with Docker Scout. 14 shows how this looks in Docker Desktop, and you get similar integrations and views in Docker Hub. 14 - Docker Scout integration with Docker Desktop</image:caption>
      <image:title>Vulnerability scanning with Docker Scout</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/docker/chapter7</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter7_0.webp</image:loc>
      <image:caption>Containers – The TLDR. 1 shows multiple containers started from a single image. The shared image is read-only, but you can write to the containers. 1</image:caption>
      <image:title>Containers – The TLDR</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter7_1.webp</image:loc>
      <image:caption>Containers vs VMs. 2 shows the two models side by side and attempts to demonstrate the more efficient nature of containers with the same server running 3x more containers than VMs. 2</image:caption>
      <image:title>Containers vs VMs</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter7_2.webp</image:loc>
      <image:caption>Images and Containers. in.3, Docker accomplishes this by creating a thin read-write layer for each container and placing it on top of the shared image. 3 - Container R/W layers</image:caption>
      <image:title>Images and Containers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter7_3.webp</image:loc>
      <image:caption>Starting a container. running Docker Desktop, you may need to substitute localhost with the name or IP of the host Docker is running on. 4 - Web app running in container</image:caption>
      <image:title>Starting a container</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter7_4.webp</image:loc>
      <image:caption>Debugging slim images and containers with Docker Debug. $ docker debug ddd-ctr. This is an attach shell, i.e</image:caption>
      <image:title>Debugging slim images and containers with Docker Debug</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter7_5.webp</image:loc>
      <image:caption>Debugging slim images and containers with Docker Debug. $ docker debug nigelpoulton/ddd-book:web0.1. Note: This is a sandbox shell. All changes will not affect the actual image</image:caption>
      <image:title>Debugging slim images and containers with Docker Debug</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/docker/chapter8</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter8_0.webp</image:loc>
      <image:caption>Containerizing an app – The TLDR. Run a container from the image You can see these five steps in.1. 1 - Basic flow of containerizing an app</image:caption>
      <image:title>Containerizing an app – The TLDR</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter8_1.webp</image:loc>
      <image:caption>Create the Dockerfile. CREATED: .dockerignore CREATED: Dockerfile CREATED: compose.yaml CREATED: README.Docker.md. Your Docker files are ready!</image:caption>
      <image:title>Create the Dockerfile</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter8_2.webp</image:loc>
      <image:caption>Containerize the app. then the WORKDIR, RUN and COPY instructions added three more layers. You can see this in.2. 2 - Dockerfile and image layers</image:caption>
      <image:title>Containerize the app</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter8_3.webp</image:loc>
      <image:caption>Push the image to Docker Hub. 3 shows how Docker figured out where to push the image. 3</image:caption>
      <image:title>Push the image to Docker Hub</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter8_4.webp</image:loc>
      <image:caption>Test the app. You should see the app as shown in.4. 4</image:caption>
      <image:title>Test the app</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter8_5.webp</image:loc>
      <image:caption>Looking a bit closer. 5 maps the Dockerfile instructions to image layers. The bold instructions with arrows create layers; the others create metadata. The layer IDs will be different in your environment. 5</image:caption>
      <image:title>Looking a bit closer</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter8_6.webp</image:loc>
      <image:caption>Moving to production with multi-stage builds. 6 shows a high-level workflow. 6</image:caption>
      <image:title>Moving to production with multi-stage builds</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter8_7.webp</image:loc>
      <image:caption>Buildx, BuildKit, drivers, and Build Cloud. 7 shows a Docker environment configured to talk to a local and a remote builder. 7 - Docker build architecture</image:caption>
      <image:title>Buildx, BuildKit, drivers, and Build Cloud</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter8_8.webp</image:loc>
      <image:caption>Multi-architecture builds. 8 shows how the images for both architectures appear on Docker Hub under the same repository and tag. 8 - Multi-platform image</image:caption>
      <image:title>Multi-architecture builds</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/docker/chapter9</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter9_0.webp</image:loc>
      <image:caption>The sample app. We’ll use the sample app shown in.1 with two services, a network, and a volume. 1 - Sample app</image:caption>
      <image:title>The sample app</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter9_1.webp</image:loc>
      <image:caption>The sample app. The compose.yaml file tells Docker how to deploy the app.2 shows the app in more detail. 2 - Detailed view of sample app</image:caption>
      <image:title>The sample app</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/docker/images/chapter9_2.webp</image:loc>
      <image:caption>Deploying apps with Compose. 5001 to view it. You can connect to localhost:5001 if you’re running Docker Desktop. Refresh the page a few times and watch the counter increment. This is the app counting page refreshes and storing the value on the volume in the Redis service</image:caption>
      <image:title>Deploying apps with Compose</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/systemdesign/chapter1</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image2.webp</image:loc>
      <image:caption>Single server setup. A journey of a thousand miles begins with a single step, and building a complex system is no different. To understand this setup, it is helpful to investigate the request flow and traffic source. Let us first look at the request flow</image:caption>
      <image:title>Single server setup</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image3.webp</image:loc>
      <image:caption>Single server setup. To understand this setup, it is helpful to investigate the request flow and traffic source. Let us first look at the request flow. Users access websites through domain names, such as api.mysite.com. Usually, the Domain Name System (DNS) is a paid service provided by 3rd parties and not hosted by our servers</image:caption>
      <image:title>Single server setup</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image4.webp</image:loc>
      <image:caption>Single server setup. GET /users/12 – Retrieve user object for id = 12</image:caption>
      <image:title>Single server setup</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image5.webp</image:loc>
      <image:caption>Database. With the growth of the user base, one server is not enough, and we need multiple servers: one for web/mobile traffic, the other for the database . Which databases to use?</image:caption>
      <image:title>Database</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image6.webp</image:loc>
      <image:caption>Load balancer. A load balancer evenly distributes incoming traffic among web servers that are defined in a load-balanced set.shows how a load balancer works. As shown in, users connect to the public IP of the load balancer directly.</image:caption>
      <image:title>Load balancer</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image7.webp</image:loc>
      <image:caption>Database replication. A master database generally only supports write operations. Advantages of database replication</image:caption>
      <image:title>Database replication</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image8.webp</image:loc>
      <image:caption>Database replication. shows the system design after adding the load balancer and database replication. Let us take a look at the design</image:caption>
      <image:title>Database replication</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image9.webp</image:loc>
      <image:caption>Cache tier. The cache tier is a temporary data store layer, much faster than the database. After receiving a request, a web server first checks if the cache has the available response.</image:caption>
      <image:title>Cache tier</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image10.webp</image:loc>
      <image:caption>Cache tier. Interacting with cache servers is simple because most cache servers provide APIs for common programming languages. The following code snippet shows typical Memcached APIs</image:caption>
      <image:title>Cache tier</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image11.webp</image:loc>
      <image:caption>Considerations for using cache. Mitigating failures: A single cache server represents a potential single point of failure (SPOF), defined in Wikipedia as follows: “A single point of failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working” [8]. Eviction Policy: Once the cache is full, any requests to add items to the cache might cause existing items to be removed.</image:caption>
      <image:title>Considerations for using cache</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image12.webp</image:loc>
      <image:caption>Considerations for using cache. Here is how CDN works at the high-level: when a user visits a website, a CDN server closest to the user will deliver static content. demonstrates the CDN workflow</image:caption>
      <image:title>Considerations for using cache</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image13.webp</image:loc>
      <image:caption>Considerations for using cache. demonstrates the CDN workflow. User A tries to get image.webp by using an image URL. The URL’s domain is provided by the CDN provider. The following two image URLs are samples used to demonstrate what image URLs look like on Amazon and Akamai CDNs</image:caption>
      <image:title>Considerations for using cache</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image14.webp</image:loc>
      <image:caption>Considerations of using a CDN. shows the design after the CDN and cache are added. Static assets (JS, CSS, images, etc.,) are no longer served by web servers. They are fetched from the CDN for better performance</image:caption>
      <image:title>Considerations of using a CDN</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image15.webp</image:loc>
      <image:caption>Stateful architecture. shows an example of a stateful architecture. In, user A’s session data and profile image are stored in Server 1.</image:caption>
      <image:title>Stateful architecture</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image16.webp</image:loc>
      <image:caption>Stateless architecture. shows the stateless architecture. In this stateless architecture, HTTP requests from users can be sent to any web servers, which fetch state data from a shared data store.</image:caption>
      <image:title>Stateless architecture</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image17.webp</image:loc>
      <image:caption>Stateless architecture. shows the updated design with a stateless web tier. In, we move the session data out of the web tier and store them in the persistent data store.</image:caption>
      <image:title>Stateless architecture</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image18.webp</image:loc>
      <image:caption>Data centers. shows an example setup with two data centers. In the event of any significant data center outage, we direct all traffic to a healthy data center. In, data center 2 (US-West) is offline, and 100% of the traffic is routed to data center 1 (US-East)</image:caption>
      <image:title>Data centers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image19.webp</image:loc>
      <image:caption>Data centers. In the event of any significant data center outage, we direct all traffic to a healthy data center. In, data center 2 (US-West) is offline, and 100% of the traffic is routed to data center 1 (US-East). Several technical challenges must be resolved to achieve multi-data center setup</image:caption>
      <image:title>Data centers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image20.webp</image:loc>
      <image:caption>Message queue. A message queue is a durable component, stored in memory, that supports asynchronous communication. Decoupling makes the message queue a preferred architecture for building a scalable and reliable application.</image:caption>
      <image:title>Message queue</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image21.webp</image:loc>
      <image:caption>Message queue. However, if the queue is empty most of the time, the number of workers can be reduced</image:caption>
      <image:title>Message queue</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image22.webp</image:loc>
      <image:caption>Adding message queues and different tools. Logging, monitoring, metrics, and automation tools are included. As the data grows every day, your database gets more overloaded. It is time to scale the data tier</image:caption>
      <image:title>Adding message queues and different tools</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image23.webp</image:loc>
      <image:caption>Horizontal scaling. Horizontal scaling, also known as sharding, is the practice of adding more servers.- 20 compares vertical scaling with horizontal scaling. Sharding separates large databases into smaller, more easily managed parts called shards. Each shard shares the same schema, though the actual data on each shard is unique to the shard</image:caption>
      <image:title>Horizontal scaling</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image24.webp</image:loc>
      <image:caption>Horizontal scaling. equals to 0, shard 0 is used to store and fetch data. If the result equals to 1, shard 1 is used. The same logic applies to other shards. shows the user table in sharded databases</image:caption>
      <image:title>Horizontal scaling</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image25.webp</image:loc>
      <image:caption>Horizontal scaling. shows the user table in sharded databases. The most important factor to consider when implementing a sharding strategy is the choice of the sharding key.</image:caption>
      <image:title>Horizontal scaling</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter1/image26.webp</image:loc>
      <image:caption>Horizontal scaling. In, we shard databases to support rapidly increasing data traffic. At the same time, some of the non-relational functionalities are moved to a NoSQL data store to reduce the database load. Here is an article that covers many use cases of NoSQL [14]</image:caption>
      <image:title>Horizontal scaling</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/systemdesign/chapter10</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter10/image122.webp</image:loc>
      <image:caption>CHAPTER 10: DESIGN A NOTIFICATION SYSTEM. A notification is more than just mobile push notification. Three types of notification formats are: mobile push notification, SMS message, and Email.shows an example of each of these notifications</image:caption>
      <image:title>CHAPTER 10: DESIGN A NOTIFICATION SYSTEM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter10/image123.webp</image:loc>
      <image:caption>iOS push notification. We start by looking at how each notification type works at a high level. We primary need three components to send an iOS push notification</image:caption>
      <image:title>iOS push notification</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter10/image124.webp</image:loc>
      <image:caption>iOS push notification. Payload: This is a JSON dictionary that contains a notification’s payload. Here is an example. APNS: This is a remote service provided by Apple to propagate push notifications to iOS devices</image:caption>
      <image:title>iOS push notification</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter10/image125.webp</image:loc>
      <image:caption>Android push notification. Android adopts a similar notification flow. Instead of using APNs, Firebase Cloud Messaging (FCM) is commonly used to send push notifications to android devices</image:caption>
      <image:title>Android push notification</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter10/image126.webp</image:loc>
      <image:caption>SMS message. For SMS messages, third party SMS services like Twilio [1], Nexmo [2], and many others are commonly used. Most of them are commercial services</image:caption>
      <image:title>SMS message</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter10/image127.webp</image:loc>
      <image:caption>Email. Although companies can set up their own email servers, many of them opt for commercial email services. Sendgrid [3] and Mailchimp [4] are among the most popular email services, which offer a better delivery rate and data analytics. shows the design after including all the third-party services</image:caption>
      <image:title>Email</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter10/image128.webp</image:loc>
      <image:caption>Email. shows the design after including all the third-party services</image:caption>
      <image:title>Email</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter10/image129.webp</image:loc>
      <image:caption>Contact info gathering flow. To send notifications, we need to gather mobile device tokens, phone numbers, or email addresses. As shown in, when a user installs our app or signs up for the first time, API servers collect user contact info and store it in the database. shows simplified database tables to store contact info. Email addresses and phone numbers are stored in the user table, whereas device tokens are stored in the device table. A</image:caption>
      <image:title>Contact info gathering flow</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter10/image130.webp</image:loc>
      <image:caption>Contact info gathering flow. user can have multiple devices, indicating that a push notification can be sent to all the user devices</image:caption>
      <image:title>Contact info gathering flow</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter10/image131.webp</image:loc>
      <image:caption>High-level design. shows the design, and each system component is explained below. Service 1 to N: A service can be a micro-service, a cron job, or a distributed system that triggers notification sending events.</image:caption>
      <image:title>High-level design</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter10/image132.webp</image:loc>
      <image:caption>High-level design. Introduce message queues to decouple the system components.shows the improved high-level design. The best way to go through the above diagram is from left to right</image:caption>
      <image:title>High-level design</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter10/image133.webp</image:loc>
      <image:caption>Request body. Put notification data to message queues for parallel processing. Here is an example of the API to send an email. Cache: User info, device info, notification templates are cached</image:caption>
      <image:title>Request body</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter10/image134.webp</image:loc>
      <image:caption>Reliability. One of the most important requirements in a notification system is that it cannot lose data. Will recipients receive a notification exactly once?</image:caption>
      <image:title>Reliability</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter10/image135.webp</image:loc>
      <image:caption>Monitor queued notifications. A key metric to monitor is the total number of queued notifications.</image:caption>
      <image:title>Monitor queued notifications</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter10/image136.webp</image:loc>
      <image:caption>Events tracking. Notification metrics, such as open rate, click rate, and engagement are important in understanding customer behaviors.</image:caption>
      <image:title>Events tracking</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter10/image137.webp</image:loc>
      <image:caption>Updated design. Putting everything together,shows the updated notification system design. In this design, many new components are added in comparison with the previous design</image:caption>
      <image:title>Updated design</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/systemdesign/chapter11</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter11/image138.webp</image:loc>
      <image:caption>CHAPTER 11: DESIGN A NEWS FEED SYSTEM. Instagram feed, Twitter timeline, etc</image:caption>
      <image:title>CHAPTER 11: DESIGN A NEWS FEED SYSTEM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter11/image139.webp</image:loc>
      <image:caption>Feed publishing. shows the high-level design of the feed publishing flow. User: a user can view news feeds on a browser or mobile app. A user makes a post with content “Hello” through API</image:caption>
      <image:title>Feed publishing</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter11/image140.webp</image:loc>
      <image:caption>Newsfeed building. In this section, we discuss how news feed is built behind the scenes.shows the high-level design. User: a user sends a request to retrieve her news feed. The request looks like this</image:caption>
      <image:title>Newsfeed building</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter11/image141.webp</image:loc>
      <image:caption>Feed publishing deep dive. outlines the detailed design for feed publishing. We have discussed most of components in high-level design, and we will focus on two components: web servers and fanout service</image:caption>
      <image:title>Feed publishing deep dive</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter11/image142.webp</image:loc>
      <image:caption>Fanout service. Let us take a close look at the fanout service as shown in. The fanout service works as follows</image:caption>
      <image:title>Fanout service</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter11/image143.webp</image:loc>
      <image:caption>Fanout service. Store &lt;post_id, user_id &gt; in news feed cache.shows an example of what the news feed looks like in cache</image:caption>
      <image:title>Fanout service</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter11/image144.webp</image:loc>
      <image:caption>Newsfeed retrieval deep dive. illustrates the detailed design for news feed retrieval. As shown in, media content (images, videos, etc.) are stored in CDN for fast retrieval. Let us look at how a client retrieves news feed</image:caption>
      <image:title>Newsfeed retrieval deep dive</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter11/image145.webp</image:loc>
      <image:caption>Cache architecture. Cache is extremely important for a news feed system. We divide the cache tier into 5 layers as shown in. News Feed: It stores IDs of news feeds</image:caption>
      <image:title>Cache architecture</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/systemdesign/chapter12</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image146.webp</image:loc>
      <image:caption>CHAPTER 12: DESIGN A CHAT SYSTEM. In this chapter we explore the design of a chat system. Almost everyone uses a chat app.shows some of the most popular apps in the marketplace. A chat app performs different functions for different people.</image:caption>
      <image:title>CHAPTER 12: DESIGN A CHAT SYSTEM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image147.webp</image:loc>
      <image:caption>Step 2 - Propose high-level design and get buy-in. shows the relationships between clients (sender and receiver) and the chat service. When a client intends to start a chat, it connects the chats service using one or more network protocols. For a chat service, the choice of network protocols is important. Let us discuss this with the interviewer</image:caption>
      <image:title>Step 2 - Propose high-level design and get buy-in</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image148.webp</image:loc>
      <image:caption>Polling. As shown in, polling is a technique that the client periodically asks the server if there are messages available.</image:caption>
      <image:title>Polling</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image149.webp</image:loc>
      <image:caption>Long polling. Because polling could be inefficient, the next progression is long polling. In long polling, a client holds the connection open until there are actually new messages available or a timeout threshold has been reached.</image:caption>
      <image:title>Long polling</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image150.webp</image:loc>
      <image:caption>WebSocket. WebSocket is the most common solution for sending asynchronous updates from server to client.shows how it works. WebSocket connection is initiated by the client.</image:caption>
      <image:title>WebSocket</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image151.webp</image:loc>
      <image:caption>WebSocket. Earlier we said that on the sender side HTTP is a fine protocol to use, but since WebSocket is bidirectional, there is no strong technical reason not to use it also for sending.shows how WebSockets (ws) is used for both sender and receiver sides. By using WebSocket for both sending and receiving, it simplifies the design and makes implementation on both client and server more straightforward.</image:caption>
      <image:title>WebSocket</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image152.webp</image:loc>
      <image:caption>High-level design. As shown in, the chat system is broken down into three major categories: stateless services, stateful services, and third-party integration</image:caption>
      <image:title>High-level design</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image153.webp</image:loc>
      <image:caption>Scalability. However, it is perfectly fine to start with a single server design. Just make sure the interviewer knows this is a starting point. Putting everything we mentioned together,shows the adjusted high-level design. In, the client maintains a persistent WebSocket connection to a chat server for real-time messaging</image:caption>
      <image:title>Scalability</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image154.webp</image:loc>
      <image:caption>Message table for 1 on 1 chat. shows the message table for 1 on 1 chat. The primary key is message_id, which helps to decide message sequence. We cannot rely on created_at to decide the message sequence because two messages can be created at the same time</image:caption>
      <image:title>Message table for 1 on 1 chat</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image155.webp</image:loc>
      <image:caption>Message table for group chat. shows the message table for group chat. The composite primary key is (channel_id, message_id). Channel and group represent the same meaning here. channel_id is the partition key because all queries in a group chat operate in a channel</image:caption>
      <image:title>Message table for group chat</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image156.webp</image:loc>
      <image:caption>Service discovery. shows how service discovery (Zookeeper) works. User A tries to log in to the app</image:caption>
      <image:title>Service discovery</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image157.webp</image:loc>
      <image:caption>1 on 1 chat flow. explains what happens when User A sends a message to User B. User A sends a chat message to Chat server 1</image:caption>
      <image:title>1 on 1 chat flow</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image158.webp</image:loc>
      <image:caption>Message synchronization across multiple devices. Many users have multiple devices. We will explain how to sync messages across multiple devices.shows an example of message synchronization. In, user A has two devices: a phone and a laptop. When User A logs in to the chat app with her phone, it establishes a WebSocket connection with Chat server 1. Similarly, there is a connection between the laptop and Chat server 1</image:caption>
      <image:title>Message synchronization across multiple devices</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image159.webp</image:loc>
      <image:caption>Small group chat flow. In comparison to the one-on-one chat, the logic of group chat is more complicated. Figures 12-14 and 12-15 explain the flow. explains what happens when User A sends a message in a group chat.</image:caption>
      <image:title>Small group chat flow</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image160.webp</image:loc>
      <image:caption>Small group chat flow. On the recipient side, a recipient can receive messages from multiple users. Each recipient has an inbox (message sync queue) which contains messages from different senders.illustrates the design</image:caption>
      <image:title>Small group chat flow</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image161.webp</image:loc>
      <image:caption>User login. The user login flow is explained in the “Service Discovery” section.</image:caption>
      <image:title>User login</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image162.webp</image:loc>
      <image:caption>User logout. When a user logs out, it goes through the user logout flow as shown in. The online status is changed to offline in the KV store. The presence indicator shows a user is offline</image:caption>
      <image:title>User logout</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image163.webp</image:loc>
      <image:caption>User disconnection. In, the client sends a heartbeat event to the server every 5 seconds.</image:caption>
      <image:title>User disconnection</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter12/image164.webp</image:loc>
      <image:caption>Online status fanout. How do user A’s friends know about the status changes?explains how it works. The above design is effective for a small user group.</image:caption>
      <image:title>Online status fanout</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/systemdesign/chapter13</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image165.webp</image:loc>
      <image:caption>CHAPTER 13: DESIGN A SEARCH AUTOCOMPLETE SYSTEM. When searching on Google or shopping at Amazon, as you type in the search box, one or more matches for the search term are presented to you.</image:caption>
      <image:title>CHAPTER 13: DESIGN A SEARCH AUTOCOMPLETE SYSTEM</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image166.webp</image:loc>
      <image:caption>Data gathering service. Let us use a simplified example to see how data gathering service works.</image:caption>
      <image:title>Data gathering service</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image167.webp</image:loc>
      <image:caption>Query service. Frequency: it represents the number of times a query has been searched. When a user types “tw” in the search box, the following top 5 searched queries are displayed , assuming the frequency table is based on Table 13-1</image:caption>
      <image:title>Query service</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image168.webp</image:loc>
      <image:caption>Query service. When a user types “tw” in the search box, the following top 5 searched queries are displayed , assuming the frequency table is based on Table 13-1. To get top 5 frequently searched queries, execute the following SQL query</image:caption>
      <image:title>Query service</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image169.webp</image:loc>
      <image:caption>Query service. To get top 5 frequently searched queries, execute the following SQL query. This is an acceptable solution when the data set is small. When it is large, accessing the database becomes a bottleneck. We will explore optimizations in deep dive</image:caption>
      <image:title>Query service</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image170.webp</image:loc>
      <image:caption>Trie data structure. shows a trie with search queries “tree”, “try”, “true”, “toy”, “wish”, “win”. Search queries are highlighted with a thicker border. Basic trie data structure stores characters in nodes. To support sorting by frequency, frequency info needs to be included in nodes. Assume we have the following frequency table</image:caption>
      <image:title>Trie data structure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image171.webp</image:loc>
      <image:caption>Trie data structure. Basic trie data structure stores characters in nodes. To support sorting by frequency, frequency info needs to be included in nodes. Assume we have the following frequency table. After adding frequency info to nodes, updated trie data structure is shown in</image:caption>
      <image:title>Trie data structure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image172.webp</image:loc>
      <image:caption>Trie data structure. After adding frequency info to nodes, updated trie data structure is shown in. How does autocomplete work with trie? Before diving into the algorithm, let us define some terms</image:caption>
      <image:title>Trie data structure</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image173.webp</image:loc>
      <image:caption>c: number of children of a given node. Step 3: Sort the children and get top 2. [true: 35] and [try: 29] are the top 2 queries with prefix “tr”. The time complexity of this algorithm is the sum of time spent on each step mentioned above</image:caption>
      <image:title>c: number of children of a given node</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image174.webp</image:loc>
      <image:caption>Cache top search queries at each node. shows the updated trie data structure. Top 5 queries are stored on each node. For example, the node with prefix “be” stores the following: [best: 35, bet: 29, bee: 20, be: 15, beer: 10]. Let us revisit the time complexity of the algorithm after applying those two optimizations</image:caption>
      <image:title>Cache top search queries at each node</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image175.webp</image:loc>
      <image:caption>Data gathering service. shows the redesigned data gathering service. Each component is examined one by one. Analytics Logs. It stores raw data about search queries. Logs are append-only and are not indexed. Table 13-3 shows an example of the log file</image:caption>
      <image:title>Data gathering service</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image176.webp</image:loc>
      <image:caption>Data gathering service. Analytics Logs. It stores raw data about search queries. Logs are append-only and are not indexed. Table 13-3 shows an example of the log file. Aggregators. The size of analytics logs is usually very large, and data is not in the right format. We need to aggregate data so it can be easily processed by our system</image:caption>
      <image:title>Data gathering service</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image177.webp</image:loc>
      <image:caption>Data gathering service. Table 13-4 shows an example of aggregated weekly data. “time” field represents the start time of a week. “frequency” field is the sum of the occurrences for the corresponding query in that week. Workers. Workers are a set of servers that perform asynchronous jobs at regular intervals. They build the trie data structure and store it in Trie DB</image:caption>
      <image:title>Data gathering service</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image178.webp</image:loc>
      <image:caption>Data gathering service. Data on each trie node is mapped to a value in a hash table.shows the mapping between the trie and hash table. In, each trie node on the left is mapped to the &lt;key, value&gt; pair on the right. If you are unclear how key-value stores work, refer to Chapter 6: Design a key-value store</image:caption>
      <image:title>Data gathering service</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image179.webp</image:loc>
      <image:caption>Query service. In the high-level design, query service calls the database directly to fetch the top 5 results.shows the improved design as previous design is inefficient. A search query is sent to the load balancer</image:caption>
      <image:title>Query service</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image180.webp</image:loc>
      <image:caption>Query service. caches the results in the browser for 1 hour. Please note: “private” in cache-control means results are intended for a single user and must not be cached by a shared cache. “max- age=3600” means the cache is valid for 3600 seconds, aka, an hour. Data sampling: For a large-scale system, logging every search query requires a lot of processing power and storage. Data sampling is important. For instance, only 1 out of every N requests is logged by the system</image:caption>
      <image:title>Query service</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image181.webp</image:loc>
      <image:caption>Update. Option 2: Update individual trie node directly.</image:caption>
      <image:title>Update</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image182.webp</image:loc>
      <image:caption>Delete. We have to remove hateful, violent, sexually explicit, or dangerous autocomplete suggestions.</image:caption>
      <image:title>Delete</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter13/image183.webp</image:loc>
      <image:caption>Scale the storage. To mitigate the data imbalance problem, we analyze historical data distribution pattern and apply smarter sharding logic as shown in.</image:caption>
      <image:title>Scale the storage</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/systemdesign/chapter14</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image184.webp</image:loc>
      <image:caption>CHAPTER 14: DESIGN YOUTUBE. In this chapter, you are asked to design YouTube. The solution to this question can be applied to other interview questions like designing a video sharing platform such as Netflix and Hulu.shows the YouTube homepage. YouTube looks simple: content creators upload videos and viewers click play.</image:caption>
      <image:title>CHAPTER 14: DESIGN YOUTUBE</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image185.webp</image:loc>
      <image:caption>Total daily storage space needed: 5 million * 10% * 300 MB = 150TB. From the rough cost estimation, we know serving videos from the CDN costs lots of money.</image:caption>
      <image:title>Total daily storage space needed: 5 million * 10% * 300 MB = 150TB</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image186.webp</image:loc>
      <image:caption>Step 2 - Propose high-level design and get buy-in. At the high-level, the system comprises three components. Client: You can watch YouTube on your computer, mobile phone, and smartTV</image:caption>
      <image:title>Step 2 - Propose high-level design and get buy-in</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image187.webp</image:loc>
      <image:caption>Video uploading flow. shows the high-level design for the video uploading. It consists of the following components</image:caption>
      <image:title>Video uploading flow</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image188.webp</image:loc>
      <image:caption>Flow a: upload the actual video. Update video metadata. Metadata contains information about video URL, size, resolution, format, user info, etc. shows how to upload the actual video. The explanation is shown below</image:caption>
      <image:title>Flow a: upload the actual video</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image189.webp</image:loc>
      <image:caption>Flow b: update the metadata. While a file is being uploaded to the original storage, the client in parallel sends a request to update the video metadata as shown in.</image:caption>
      <image:title>Flow b: update the metadata</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image190.webp</image:loc>
      <image:caption>Video streaming flow. Videos are streamed from CDN directly. The edge server closest to you will deliver the video. Thus, there is very little latency.shows a high level of design for video streaming</image:caption>
      <image:title>Video streaming flow</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image191.webp</image:loc>
      <image:caption>Directed acyclic graph (DAG) model. To support different video processing pipelines and maintain high parallelism, it is important to add some level of abstraction and let client programmers define what tasks to execute. In, the original video is split into video, audio, and metadata. Here are some of the tasks that can be applied on a video file</image:caption>
      <image:title>Directed acyclic graph (DAG) model</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image192.webp</image:loc>
      <image:caption>Directed acyclic graph (DAG) model. Watermark: An image overlay on top of your video contains identifying information about your video</image:caption>
      <image:title>Directed acyclic graph (DAG) model</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image193.webp</image:loc>
      <image:caption>Video transcoding architecture. The proposed video transcoding architecture that leverages the cloud services, is shown in. The architecture has six main components: preprocessor, DAG scheduler, resource manager, task workers, temporary storage, and encoded video as the output. Let us take a close look at each component</image:caption>
      <image:title>Video transcoding architecture</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image194.webp</image:loc>
      <image:caption>Preprocessor. The architecture has six main components: preprocessor, DAG scheduler, resource manager, task workers, temporary storage, and encoded video as the output. Let us take a close look at each component. The preprocessor has 4 responsibilities</image:caption>
      <image:title>Preprocessor</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image195.webp</image:loc>
      <image:caption>Preprocessor. DAG generation. The processor generates DAG based on configuration files client programmers write.is a simplified DAG representation which has 2 nodes and 1 edge. This DAG representation is generated from the two configuration files below</image:caption>
      <image:title>Preprocessor</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image196.webp</image:loc>
      <image:caption>Preprocessor. This DAG representation is generated from the two configuration files below. Cache data. The preprocessor is a cache for segmented videos. For better reliability, the preprocessor stores GOPs and metadata in temporary storage. If video encoding fails, the system could use persisted data for retry operations</image:caption>
      <image:title>Preprocessor</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image197.webp</image:loc>
      <image:caption>DAG scheduler. Cache data. The preprocessor is a cache for segmented videos. For better reliability, the preprocessor stores GOPs and metadata in temporary storage. If video encoding fails, the system could use persisted data for retry operations. The DAG scheduler splits a DAG graph into stages of tasks and puts them in the task queue in the resource manager.shows an example of how the DAG scheduler works</image:caption>
      <image:title>DAG scheduler</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image198.webp</image:loc>
      <image:caption>DAG scheduler. The DAG scheduler splits a DAG graph into stages of tasks and puts them in the task queue in the resource manager.shows an example of how the DAG scheduler works. As shown in, the original video is split into three stages: Stage 1: video, audio, and metadata.</image:caption>
      <image:title>DAG scheduler</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image199.webp</image:loc>
      <image:caption>Resource manager. As shown in, the original video is split into three stages: Stage 1: video, audio, and metadata. The resource manager is responsible for managing the efficiency of resource allocation. It contains 3 queues and a task scheduler as shown in</image:caption>
      <image:title>Resource manager</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image200.webp</image:loc>
      <image:caption>Resource manager. Task scheduler: It picks the optimal task/worker, and instructs the chosen task worker to execute the job. The resource manager works as follows</image:caption>
      <image:title>Resource manager</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image201.webp</image:loc>
      <image:caption>Task workers. The task scheduler removes the job from the running queue once the job is done. Task workers run the tasks which are defined in the DAG. Different task workers may run different tasks as shown in</image:caption>
      <image:title>Task workers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image202.webp</image:loc>
      <image:caption>Task workers. Task workers run the tasks which are defined in the DAG. Different task workers may run different tasks as shown in</image:caption>
      <image:title>Task workers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image203.webp</image:loc>
      <image:caption>Temporary storage. Task workers run the tasks which are defined in the DAG. Different task workers may run different tasks as shown in. Multiple storage systems are used here. The choice of storage system depends on factors like data type, data size, access frequency, data life span, etc. For instance, metadata is frequently</image:caption>
      <image:title>Temporary storage</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image204.webp</image:loc>
      <image:caption>Encoded video. accessed by workers, and the data size is usually small. Thus, caching metadata in memory is a good idea. For video or audio data, we put them in blob storage. Data in temporary storage is freed up once the corresponding video processing is complete. Encoded video is the final output of the encoding pipeline. Here is an example of the output: funny_720p.mp4</image:caption>
      <image:title>Encoded video</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image205.webp</image:loc>
      <image:caption>Speed optimization: parallelize video uploading. Uploading a video as a whole unit is inefficient. We can split a video into smaller chunks by GOP alignment as shown in. This allows fast resumable uploads when the previous upload failed. The job of splitting a video file by GOP can be implemented by the client to improve the upload speed as shown in</image:caption>
      <image:title>Speed optimization: parallelize video uploading</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image206.webp</image:loc>
      <image:caption>Speed optimization: parallelize video uploading. This allows fast resumable uploads when the previous upload failed. The job of splitting a video file by GOP can be implemented by the client to improve the upload speed as shown in</image:caption>
      <image:title>Speed optimization: parallelize video uploading</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image207.webp</image:loc>
      <image:caption>Speed optimization: place upload centers close to users. globe . People in the United States can upload videos to the North America upload center, and people in China can upload videos to the Asian upload center. To achieve this, we use CDN as upload centers</image:caption>
      <image:title>Speed optimization: place upload centers close to users</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image208.webp</image:loc>
      <image:caption>Speed optimization: parallelism everywhere. Our design needs some modifications to achieve high parallelism. To make the system more loosely coupled, we introduced message queues as shown in Figure</image:caption>
      <image:title>Speed optimization: parallelism everywhere</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image209.webp</image:loc>
      <image:caption>Speed optimization: parallelism everywhere. After the message queue is introduced, the encoding module does not need to wait for the output of the download module anymore. If there are events in the message queue, the encoding module can execute those jobs in parallel</image:caption>
      <image:title>Speed optimization: parallelism everywhere</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image210.webp</image:loc>
      <image:caption>Safety optimization: pre-signed upload URL. Safety is one of the most important aspects of any product. To ensure only authorized users upload videos to the right location, we introduce pre-signed URLs as shown in. The upload flow is updated as follows</image:caption>
      <image:title>Safety optimization: pre-signed upload URL</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter14/image211.webp</image:loc>
      <image:caption>Cost-saving optimization. storage video servers. For less popular content, we may not need to store many encoded video versions. Short videos can be encoded on-demand</image:caption>
      <image:title>Cost-saving optimization</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/systemdesign/chapter15</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter15/image212.webp</image:loc>
      <image:caption>CHAPTER 15: DESIGN GOOGLE DRIVE. Let us take a moment to understand Google Drive before jumping into the design.</image:caption>
      <image:title>CHAPTER 15: DESIGN GOOGLE DRIVE</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter15/image213.webp</image:loc>
      <image:caption>CHAPTER 15: DESIGN GOOGLE DRIVE. Let us take a moment to understand Google Drive before jumping into the design.</image:caption>
      <image:title>CHAPTER 15: DESIGN GOOGLE DRIVE</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter15/image214.webp</image:loc>
      <image:caption>Step 2 - Propose high-level design and get buy-in. shows an example of how the /drive directory looks like on the left side and its expanded view on the right side</image:caption>
      <image:title>Step 2 - Propose high-level design and get buy-in</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter15/image215.webp</image:loc>
      <image:caption>Move away from single server. As more files are uploaded, eventually you get the space full alert as shown in. Only 10 MB of storage space is left! This is an emergency as users cannot upload files anymore. The first solution comes to mind is to shard the data, so it is stored on multiple storage servers.shows an example of sharding based on user_id</image:caption>
      <image:title>Move away from single server</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter15/image216.webp</image:loc>
      <image:caption>Move away from single server. Only 10 MB of storage space is left! This is an emergency as users cannot upload files anymore. The first solution comes to mind is to shard the data, so it is stored on multiple storage servers.shows an example of sharding based on user_id. You pull an all-nighter to set up database sharding and monitor it closely.</image:caption>
      <image:title>Move away from single server</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter15/image217.webp</image:loc>
      <image:caption>Move away from single server. Redundant files are stored in multiple regions to guard against data loss and ensure availability. A bucket is like a folder in file systems. After putting files in S3, you can finally have a good night&apos;s sleep without worrying about data losses. To stop similar problems from happening in the future, you decide to do further research on areas you can improve. Here are a few areas you find</image:caption>
      <image:title>Move away from single server</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter15/image218.webp</image:loc>
      <image:caption>Move away from single server. After applying the above improvements, you have successfully decoupled web servers, metadata database, and file storage from a single server. The updated design is shown in</image:caption>
      <image:title>Move away from single server</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter15/image219.webp</image:loc>
      <image:caption>Sync conflicts. For a large storage system like Google Drive, sync conflicts happen from time to time. In, user 1 and user 2 tries to update the same file at the same time, but user 1’s file is processed by our system first.</image:caption>
      <image:title>Sync conflicts</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter15/image220.webp</image:loc>
      <image:caption>Sync conflicts. In, user 1 and user 2 tries to update the same file at the same time, but user 1’s file is processed by our system first. While multiple users are editing the same document at the same, it is challenging to keep the document synchronized. Interested readers should refer to the reference material [4] [5]</image:caption>
      <image:title>Sync conflicts</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter15/image221.webp</image:loc>
      <image:caption>High-level design. illustrates the proposed high-level design. Let us examine each component of the system. User: A user uses the application either through a browser or mobile app</image:caption>
      <image:title>High-level design</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter15/image222.webp</image:loc>
      <image:caption>Block servers. shows how a block server works when a new file is added. A file is split into smaller blocks</image:caption>
      <image:title>Block servers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter15/image223.webp</image:loc>
      <image:caption>Block servers. illustrates delta sync, meaning only modified blocks are transferred to cloud storage. Highlighted blocks “block 2” and “block 5” represent changed blocks. Using delta sync, only those two blocks are uploaded to the cloud storage. Block servers allow us to save network traffic by providing delta sync and compression</image:caption>
      <image:title>Block servers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter15/image224.webp</image:loc>
      <image:caption>Metadata database. shows the database schema design. Please note this is a highly simplified version as it only includes the most important tables and interesting fields. User: The user table contains basic information about the user such as username, email, profile photo, etc</image:caption>
      <image:title>Metadata database</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter15/image225.webp</image:loc>
      <image:caption>Upload flow. Let us discuss what happens when a client uploads a file. To better understand the flow, we draw the sequence diagram as shown in. In, two requests are sent in parallel: add file metadata and upload the file to cloud storage. Both requests originate from client 1</image:caption>
      <image:title>Upload flow</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter15/image226.webp</image:loc>
      <image:caption>Download flow. Once a client knows a file is changed, it first requests metadata via API servers, then downloads blocks to construct the file.shows the detailed flow. Note, only the most important components are shown in the diagram due to space constraint. Notification service informs client 2 that a file is changed somewhere else</image:caption>
      <image:title>Download flow</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/systemdesign/chapter2</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter2/image27.webp</image:loc>
      <image:caption>Power of two. Although data volume can become enormous when dealing with distributed systems, calculation all boils down to the basics.</image:caption>
      <image:title>Power of two</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter2/image28.webp</image:loc>
      <image:caption>Latency numbers every programmer should know. Dr.</image:caption>
      <image:title>Latency numbers every programmer should know</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter2/image29.webp</image:loc>
      <image:caption>1 µs= 10^-6 seconds = 1,000 ns. A Google software engineer built a tool to visualize Dr. Dean’s numbers. The tool also takes the time factor into consideration. Figures 2-1 shows the visualized latency numbers as of 2020 (source of figures: reference material [3]). By analyzing the numbers in, we get the following conclusions</image:caption>
      <image:title>1 µs= 10^-6 seconds = 1,000 ns</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter2/image30.webp</image:loc>
      <image:caption>Availability numbers. A service level agreement (SLA) is a commonly used term for service providers. Example: Estimate Twitter QPS and storage requirements</image:caption>
      <image:title>Availability numbers</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/systemdesign/chapter3</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter3/image31.webp</image:loc>
      <image:caption>Example. andpresent high-level designs for feed publishing and news feed building flows, respectively</image:caption>
      <image:title>Example</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter3/image32.webp</image:loc>
      <image:caption>Example. andpresent high-level designs for feed publishing and news feed building flows, respectively</image:caption>
      <image:title>Example</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter3/image33.webp</image:loc>
      <image:caption>News feed retrieval. andshow the detailed design for the two use cases, which will be explained in detail in Chapter 11</image:caption>
      <image:title>News feed retrieval</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter3/image34.webp</image:loc>
      <image:caption>News feed retrieval. andshow the detailed design for the two use cases, which will be explained in detail in Chapter 11</image:caption>
      <image:title>News feed retrieval</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/systemdesign/chapter4</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image35.webp</image:loc>
      <image:caption>Step 2 - Propose high-level design and get buy-in. Server-side implementation.shows a rate limiter that is placed on the server- side. Besides the client and server-side implementations, there is an alternative way. Instead of putting a rate limiter at the API servers, we create a rate limiter middleware, which throttles requests to your APIs as shown in</image:caption>
      <image:title>Step 2 - Propose high-level design and get buy-in</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image36.webp</image:loc>
      <image:caption>Step 2 - Propose high-level design and get buy-in. Besides the client and server-side implementations, there is an alternative way. Instead of putting a rate limiter at the API servers, we create a rate limiter middleware, which throttles requests to your APIs as shown in. Let us use an example into illustrate how rate limiting works in this design.</image:caption>
      <image:title>Step 2 - Propose high-level design and get buy-in</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image37.webp</image:loc>
      <image:caption>Step 2 - Propose high-level design and get buy-in. Let us use an example into illustrate how rate limiting works in this design. Cloud microservices [4] have become widely popular and rate limiting is usually implemented within a component called API gateway.</image:caption>
      <image:title>Step 2 - Propose high-level design and get buy-in</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image38.webp</image:loc>
      <image:caption>Token bucket algorithm. A token bucket is a container that has pre-defined capacity. Each request consumes one token. When a request arrives, we check if there are enough tokens in the bucket.explains how it works</image:caption>
      <image:title>Token bucket algorithm</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image39.webp</image:loc>
      <image:caption>Token bucket algorithm. If there are not enough tokens, the request is dropped. illustrates how token consumption, refill, and rate limiting logic work. In this example, the token bucket size is 4, and the refill rate is 4 per 1 minute</image:caption>
      <image:title>Token bucket algorithm</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image40.webp</image:loc>
      <image:caption>Token bucket algorithm. illustrates how token consumption, refill, and rate limiting logic work. In this example, the token bucket size is 4, and the refill rate is 4 per 1 minute. The token bucket algorithm takes two parameters</image:caption>
      <image:title>Token bucket algorithm</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image41.webp</image:loc>
      <image:caption>Leaking bucket algorithm. Requests are pulled from the queue and processed at regular intervals.explains how the algorithm works. Leaking bucket algorithm takes the following two parameters</image:caption>
      <image:title>Leaking bucket algorithm</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image42.webp</image:loc>
      <image:caption>Fixed window counter algorithm. Let us use a concrete example to see how it works. In, the time unit is 1 second and the system allows a maximum of 3 requests per second. In each second window, if more than 3 requests are received, extra requests are dropped as shown in. A major problem with this algorithm is that a burst of traffic at the edges of time windows could cause more requests than allowed quota to go through. Consider the following case</image:caption>
      <image:title>Fixed window counter algorithm</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image43.webp</image:loc>
      <image:caption>Fixed window counter algorithm. A major problem with this algorithm is that a burst of traffic at the edges of time windows could cause more requests than allowed quota to go through. Consider the following case. In, the system allows a maximum of 5 requests per minute, and the available quota resets at the human-friendly round minute.</image:caption>
      <image:title>Fixed window counter algorithm</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image44.webp</image:loc>
      <image:caption>Sliding window log algorithm. We explain the algorithm with an example as revealed in. In this example, the rate limiter allows 2 requests per minute. Usually, Linux timestamps are stored in the log. However, human-readable representation of time is used in our example for better readability</image:caption>
      <image:title>Sliding window log algorithm</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image45.webp</image:loc>
      <image:caption>Sliding window counter algorithm. The sliding window counter algorithm is a hybrid approach that combines the fixed window counter and sliding window log. Assume the rate limiter allows a maximum of 7 requests per minute, and there are 5 requests in the previous minute and 3 in the current minute.</image:caption>
      <image:title>Sliding window counter algorithm</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image46.webp</image:loc>
      <image:caption>High-level architecture. shows the high-level architecture for rate limiting, and this works as follows. The client sends a request to rate limiting middleware</image:caption>
      <image:title>High-level architecture</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image47.webp</image:loc>
      <image:caption>Detailed design. presents a detailed design of the system. Rules are stored on the disk. Workers frequently pull rules from the disk and store them in the cache</image:caption>
      <image:title>Detailed design</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image48.webp</image:loc>
      <image:caption>Race condition. Race conditions can happen in a highly concurrent environment as shown in. Assume the counter value in Redis is 3.</image:caption>
      <image:title>Race condition</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image49.webp</image:loc>
      <image:caption>Synchronization issue. rate limiter 2. One possible solution is to use sticky sessions that allow a client to send traffic to the same rate limiter.</image:caption>
      <image:title>Synchronization issue</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image50.webp</image:loc>
      <image:caption>Synchronization issue. One possible solution is to use sticky sessions that allow a client to send traffic to the same rate limiter.</image:caption>
      <image:title>Synchronization issue</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter4/image51.webp</image:loc>
      <image:caption>Performance optimization. First, multi-data center setup is crucial for a rate limiter because latency is high for users located far away from the data center. Second, synchronize data with an eventual consistency model. If you are unclear about the eventual consistency model, refer to the “Consistency” section in “Chapter 6: Design a Key- value Store.”</image:caption>
      <image:title>Performance optimization</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/systemdesign/chapter5</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image52.webp</image:loc>
      <image:caption>The rehashing problem. Let us use an example to illustrate how it works. As shown in Table 5-1, we have 4 servers and 8 string keys with their hashes. To fetch the server where a key is stored, we perform the modular operation f(key) % 4. For instance, hash(key0) % 4 = 1 means a client must contact server 1 to fetch the cached data.shows the distribution of keys based on Table 5-1</image:caption>
      <image:title>The rehashing problem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image53.webp</image:loc>
      <image:caption>The rehashing problem. To fetch the server where a key is stored, we perform the modular operation f(key) % 4. For instance, hash(key0) % 4 = 1 means a client must contact server 1 to fetch the cached data.shows the distribution of keys based on Table 5-1. This approach works well when the size of the server pool is fixed, and the data distribution is even.</image:caption>
      <image:title>The rehashing problem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image54.webp</image:loc>
      <image:caption>The rehashing problem. This approach works well when the size of the server pool is fixed, and the data distribution is even. shows the new distribution of keys based on Table 5-2</image:caption>
      <image:title>The rehashing problem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image55.webp</image:loc>
      <image:caption>The rehashing problem. shows the new distribution of keys based on Table 5-2. As shown in, most keys are redistributed, not just the ones originally stored in the offline server (server 1).</image:caption>
      <image:title>The rehashing problem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image56.webp</image:loc>
      <image:caption>Hash space and hash ring. Now we understand the definition of consistent hashing, let us find out how it works. By collecting both ends, we get a hash ring as shown in</image:caption>
      <image:title>Hash space and hash ring</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image57.webp</image:loc>
      <image:caption>Hash space and hash ring. By collecting both ends, we get a hash ring as shown in</image:caption>
      <image:title>Hash space and hash ring</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image58.webp</image:loc>
      <image:caption>Hash servers. Using the same hash function f, we map servers based on server IP or name onto the ring.shows that 4 servers are mapped on the hash ring</image:caption>
      <image:title>Hash servers</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image59.webp</image:loc>
      <image:caption>Hash keys. One thing worth mentioning is that hash function used here is different from the one in “the rehashing problem,” and there is no modular operation. As shown in, 4 cache keys (key0, key1, key2, and key3) are hashed onto the hash ring</image:caption>
      <image:title>Hash keys</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image60.webp</image:loc>
      <image:caption>Server lookup. To determine which server a key is stored on, we go clockwise from the key position on the ring until a server is found.explains this process.</image:caption>
      <image:title>Server lookup</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image61.webp</image:loc>
      <image:caption>Add a server. In, after a new server 4 is added, only key0 needs to be redistributed.</image:caption>
      <image:title>Add a server</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image62.webp</image:loc>
      <image:caption>Remove a server. The rest of the keys are unaffected</image:caption>
      <image:title>Remove a server</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image63.webp</image:loc>
      <image:caption>Two issues in the basic approach. Two problems are identified with this approach. Second, it is possible to have a non-uniform key distribution on the ring. For instance, if servers are mapped to positions listed in, most of the keys are stored on server 2. However, server 1 and server 3 have no data</image:caption>
      <image:title>Two issues in the basic approach</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image64.webp</image:loc>
      <image:caption>Two issues in the basic approach. Second, it is possible to have a non-uniform key distribution on the ring. For instance, if servers are mapped to positions listed in, most of the keys are stored on server 2. However, server 1 and server 3 have no data. A technique called virtual nodes or replicas is used to solve these problems</image:caption>
      <image:title>Two issues in the basic approach</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image65.webp</image:loc>
      <image:caption>Virtual nodes. arbitrarily chosen; and in real-world systems, the number of virtual nodes is much larger. To find which server a key is stored on, we go clockwise from the key’s location and find the first virtual node encountered on the ring.</image:caption>
      <image:title>Virtual nodes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image66.webp</image:loc>
      <image:caption>Virtual nodes. To find which server a key is stored on, we go clockwise from the key’s location and find the first virtual node encountered on the ring. As the number of virtual nodes increases, the distribution of keys becomes more balanced.</image:caption>
      <image:title>Virtual nodes</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image67.webp</image:loc>
      <image:caption>Find affected keys. In, server 4 is added onto the ring. The affected range starts from s4 (newly added node) and moves anticlockwise around the ring until a server is found (s3). Thus, keys located between s3 and s4 need to be redistributed to s4. When a server (s1) is removed as shown in, the affected range starts from s1 (removed node) and moves anticlockwise around the ring until a server is found (s0). Thus, keys located between s0 and s1 must be redistributed to s2</image:caption>
      <image:title>Find affected keys</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter5/image68.webp</image:loc>
      <image:caption>Find affected keys. When a server (s1) is removed as shown in, the affected range starts from s1 (removed node) and moves anticlockwise around the ring until a server is found (s0). Thus, keys located between s0 and s1 must be redistributed to s2</image:caption>
      <image:title>Find affected keys</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/systemdesign/chapter6</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image69.webp</image:loc>
      <image:caption>Hashed key: 253DDEC4. Here is a data snippet in a key-value store. In this chapter, you are asked to design a key-value store that supports the following operations</image:caption>
      <image:title>Hashed key: 253DDEC4</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image70.webp</image:loc>
      <image:caption>CAP theorem. CAP theorem states that one of the three properties must be sacrificed to support 2 of the 3 properties as shown in. Nowadays, key-value stores are classified based on the two CAP characteristics they support</image:caption>
      <image:title>CAP theorem</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image71.webp</image:loc>
      <image:caption>Ideal situation. In the ideal world, network partition never occurs. Data written to n1 is automatically replicated to n2 and n3. Both consistency and availability are achieved</image:caption>
      <image:title>Ideal situation</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image72.webp</image:loc>
      <image:caption>Real-world distributed systems. In a distributed system, partitions cannot be avoided, and when a partition occurs, we must choose between consistency and availability. If we choose consistency over availability (CP system), we must block all write operations to n1 and n2 to avoid data inconsistency among these three servers, which makes the system unavailable.</image:caption>
      <image:title>Real-world distributed systems</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image73.webp</image:loc>
      <image:caption>Data partition. Next, a key is hashed onto the same ring, and it is stored on the first server encountered while moving in the clockwise direction. For instance, key0 is stored in s1 using this logic. Using consistent hashing to partition data has the following advantages</image:caption>
      <image:title>Data partition</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image74.webp</image:loc>
      <image:caption>Data replication. To achieve high availability and reliability, data must be replicated asynchronously over N servers, where N is a configurable parameter. With virtual nodes, the first N nodes on the ring may be owned by fewer than N physical servers. To avoid this issue, we only choose unique servers while performing the clockwise walk logic</image:caption>
      <image:title>Data replication</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image75.webp</image:loc>
      <image:caption>N = The number of replicas. Consider the following example shown inwith N = 3. W = 1 does not mean data is written on one server.</image:caption>
      <image:title>N = The number of replicas</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image76.webp</image:loc>
      <image:caption>Inconsistency resolution: versioning. As shown in, both replica nodes n1 and n2 have the same value. Let us call this value the original value. Server 1 and server 2 get the same value for get(“name”) operation. Next, server 1 changes the name to “johnSanFrancisco”, and server 2 changes the name to “johnNewYork” as shown in. These two changes are performed simultaneously. Now, we have conflicting values, called versions v1 and v2</image:caption>
      <image:title>Inconsistency resolution: versioning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image77.webp</image:loc>
      <image:caption>Inconsistency resolution: versioning. Next, server 1 changes the name to “johnSanFrancisco”, and server 2 changes the name to “johnNewYork” as shown in. These two changes are performed simultaneously. Now, we have conflicting values, called versions v1 and v2. In this example, the original value could be ignored because the modifications were based on it.</image:caption>
      <image:title>Inconsistency resolution: versioning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image78.webp</image:loc>
      <image:caption>Inconsistency resolution: versioning. The above abstract logic is explained with a concrete example as shown in. A client writes a data item D1 to the system, and the write is handled by server Sx, which now has the vector clock D1[(Sx, 1)]</image:caption>
      <image:title>Inconsistency resolution: versioning</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image79.webp</image:loc>
      <image:caption>Failure detection. As shown in, all-to-all multicasting is a straightforward solution. However, this is inefficient when many servers are in the system. A better solution is to use decentralized failure detection methods like gossip protocol. Gossip protocol works as follows</image:caption>
      <image:title>Failure detection</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image80.webp</image:loc>
      <image:caption>Failure detection. If the heartbeat has not increased for more than predefined periods, the member is considered as offline. As shown in</image:caption>
      <image:title>Failure detection</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image81.webp</image:loc>
      <image:caption>Handling temporary failures. If a server is unavailable due to network or server failures, another server will process requests temporarily.</image:caption>
      <image:title>Handling temporary failures</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image82.webp</image:loc>
      <image:caption>Handling permanent failures. Step 1: Divide key space into buckets (4 in our example) as shown in. A bucket is used as the root level node to maintain a limited depth of the tree. Step 2: Once the buckets are created, hash each key in a bucket using a uniform hashing method</image:caption>
      <image:title>Handling permanent failures</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image83.webp</image:loc>
      <image:caption>Handling permanent failures. Step 2: Once the buckets are created, hash each key in a bucket using a uniform hashing method. Step 3: Create a single hash node per bucket</image:caption>
      <image:title>Handling permanent failures</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image84.webp</image:loc>
      <image:caption>Handling permanent failures. Step 3: Create a single hash node per bucket. Step 4: Build the tree upwards till root by calculating hashes of children</image:caption>
      <image:title>Handling permanent failures</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image85.webp</image:loc>
      <image:caption>Handling permanent failures. Step 4: Build the tree upwards till root by calculating hashes of children. To compare two Merkle trees, start by comparing the root hashes.</image:caption>
      <image:title>Handling permanent failures</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image86.webp</image:loc>
      <image:caption>System architecture diagram. Now that we have discussed different technical considerations in designing a key-value store, we can shift our focus on the architecture diagram, shown in. Main features of the architecture are listed as follows</image:caption>
      <image:title>System architecture diagram</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image87.webp</image:loc>
      <image:caption>System architecture diagram. There is no single point of failure as every node has the same set of responsibilities. As the design is decentralized, each node performs many tasks as presented in</image:caption>
      <image:title>System architecture diagram</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image88.webp</image:loc>
      <image:caption>Write path. explains what happens after a write request is directed to a specific node. Please note the proposed designs for write/read paths are primary based on the architecture of Cassandra [8]. The write request is persisted on a commit log file</image:caption>
      <image:title>Write path</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image89.webp</image:loc>
      <image:caption>Read path. After a read request is directed to a specific node, it first checks if data is in the memory cache. If so, the data is returned to the client as shown in. If the data is not in memory, it will be retrieved from the disk instead. We need an efficient way to find out which SSTable contains the key. Bloom filter [10] is commonly used to solve this problem</image:caption>
      <image:title>Read path</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image90.webp</image:loc>
      <image:caption>Read path. The read path is shown inwhen data is not in memory. The system first checks if data is in memory. If not, go to step 2</image:caption>
      <image:title>Read path</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter6/image91.webp</image:loc>
      <image:caption>Summary. This chapter covers many concepts and techniques. To refresh your memory, the following table summarizes features and corresponding techniques used for a distributed key-value store</image:caption>
      <image:title>Summary</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/systemdesign/chapter7</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter7/image92.webp</image:loc>
      <image:caption>CHAPTER 7: DESIGN A UNIQUE ID GENERATOR IN DISTRIBUTED SYSTEMS. Here are a few examples of unique IDs. Step 1 - Understand the problem and establish design scope</image:caption>
      <image:title>CHAPTER 7: DESIGN A UNIQUE ID GENERATOR IN DISTRIBUTED SYSTEMS</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter7/image93.webp</image:loc>
      <image:caption>Multi-master replication. As shown in, the first approach is multi-master replication. This approach uses the databases’ auto_increment feature.</image:caption>
      <image:title>Multi-master replication</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter7/image94.webp</image:loc>
      <image:caption>UUID. Here is an example of UUID: 09c93e62-50b4-468d-bf8a-c07e1040bfb2. UUIDs can be generated independently without coordination between servers.presents the UUIDs design. In this design, each web server contains an ID generator, and a web server is responsible for generating IDs independently</image:caption>
      <image:title>UUID</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter7/image95.webp</image:loc>
      <image:caption>Ticket Server. Ticket servers are another interesting way to generate unique IDs. Flicker developed ticket servers to generate distributed primary keys [2]. It is worth mentioning how the system works. The idea is to use a centralized auto_increment feature in a single database server (Ticket Server). To learn more about this, refer to flicker’s engineering blog article [2]</image:caption>
      <image:title>Ticket Server</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter7/image96.webp</image:loc>
      <image:caption>Twitter snowflake approach. Divide and conquer is our friend. Instead of generating an ID directly, we divide an ID into different sections.shows the layout of a 64-bit ID. Each section is explained below</image:caption>
      <image:title>Twitter snowflake approach</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter7/image97.webp</image:loc>
      <image:caption>Step 3 - Design deep dive. In the high-level design, we discussed various options to design a unique ID generator in distributed systems. Datacenter IDs and machine IDs are chosen at the startup time, generally fixed once the system is up running.</image:caption>
      <image:title>Step 3 - Design deep dive</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter7/image98.webp</image:loc>
      <image:caption>Timestamp. The most important 41 bits make up the timestamp section. The maximum timestamp that can be represented in 41 bits is</image:caption>
      <image:title>Timestamp</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/systemdesign/chapter8</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter8/image99.webp</image:loc>
      <image:caption>URL redirecting. shows what happens when you enter a tinyurl onto the browser. Once the server receives a tinyurl request, it changes the short URL to the long URL with 301 redirect. The detailed communication between clients and servers is shown in</image:caption>
      <image:title>URL redirecting</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter8/image100.webp</image:loc>
      <image:caption>URL redirecting. The detailed communication between clients and servers is shown in. One thing worth discussing here is 301 redirect vs 302 redirect</image:caption>
      <image:title>URL redirecting</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter8/image101.webp</image:loc>
      <image:caption>URL shortening. Let us assume the short URL looks like this: www.tinyurl.com/{hashValue}. To support the URL shortening use case, we must find a hash function fx that maps a long URL to the hashValue, as shown in. The hash function must satisfy the following requirements</image:caption>
      <image:title>URL shortening</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter8/image102.webp</image:loc>
      <image:caption>Data model. In the high-level design, everything is stored in a hash table.</image:caption>
      <image:title>Data model</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter8/image103.webp</image:loc>
      <image:caption>Hash value length. ≥ 365 billion. The system must support up to 365 billion URLs based on the back of the envelope estimation. Table 8-1 shows the length of hashValue and the corresponding maximal number of URLs it can support. When n = 7, 62 ^ n = ~3.5 trillion, 3.5 trillion is more than enough to hold 365 billion URLs, so the length of hashValue is 7</image:caption>
      <image:title>Hash value length</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter8/image104.webp</image:loc>
      <image:caption>Hash + collision resolution. To shorten a long URL, we should implement a hash function that hashes a long URL to a 7- character string. As shown in Table 8-2, even the shortest hash value (from CRC32) is too long (more than 7 characters). How can we make it shorter?</image:caption>
      <image:title>Hash + collision resolution</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter8/image105.webp</image:loc>
      <image:caption>Hash + collision resolution. 5. This method can eliminate collision; however, it is expensive to query the database to check if a shortURL exists for every request.</image:caption>
      <image:title>Hash + collision resolution</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter8/image106.webp</image:loc>
      <image:caption>Base 62 conversion. representation.shows the conversation process. Thus, the short URL is https://tinyurl.com /2TX Comparison of the two approaches</image:caption>
      <image:title>Base 62 conversion</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter8/image107.webp</image:loc>
      <image:caption>Base 62 conversion. Table 8-3 shows the differences of the two approaches</image:caption>
      <image:title>Base 62 conversion</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter8/image108.webp</image:loc>
      <image:caption>URL shortening deep dive. As one of the core pieces of the system, we want the URL shortening flow to be logically simple and functional. Base 62 conversion is used in our design. We build the following diagram to demonstrate the flow. longURL is the input</image:caption>
      <image:title>URL shortening deep dive</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter8/image109.webp</image:loc>
      <image:caption>URL shortening deep dive. Save ID, shortURL, and longURL to the database as shown in Table 8-4. The distributed unique ID generator is worth mentioning. Its primary function is to generate globally unique IDs, which are used for creating shortURLs. In a highly distributed</image:caption>
      <image:title>URL shortening deep dive</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter8/image110.webp</image:loc>
      <image:caption>URL redirecting deep dive. shows the detailed design of the URL redirecting. As there are more reads than writes, &lt;shortURL, longURL&gt; mapping is stored in a cache to improve performance. The flow of URL redirecting is summarized as follows</image:caption>
      <image:title>URL redirecting deep dive</image:title>
    </image:image>
  </url>
  <url>
    <loc>https://whizan.xyz/books/systemdesign/chapter9</loc>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter9/image111.webp</image:loc>
      <image:caption>CHAPTER 9: DESIGN A WEB CRAWLER. A web crawler is known as a robot or spider. A crawler is used for many purposes</image:caption>
      <image:title>CHAPTER 9: DESIGN A WEB CRAWLER</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter9/image112.webp</image:loc>
      <image:caption>Step 2 - Propose high-level design and get buy-in. Once the requirements are clear, we move on to the high-level design. Inspired by previous studies on web crawling [4] [5], we propose a high-level design as shown in. First, we explore each design component to understand their functionalities. Then, we examine the crawler workflow step-by-step</image:caption>
      <image:title>Step 2 - Propose high-level design and get buy-in</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter9/image113.webp</image:loc>
      <image:caption>URL Extractor. URL Extractor parses and extracts links from HTML pages.shows an example of a link extraction process. Relative paths are converted to absolute URLs by adding the “https://en.wikipedia.org” prefix</image:caption>
      <image:title>URL Extractor</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter9/image114.webp</image:loc>
      <image:caption>Web crawler workflow. To better explain the workflow step-by-step, sequence numbers are added in the design diagram as shown in</image:caption>
      <image:title>Web crawler workflow</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter9/image115.webp</image:loc>
      <image:caption>DFS vs BFS. Most links from the same web page are linked back to the same host. Standard BFS does not take the priority of a URL into consideration.</image:caption>
      <image:title>DFS vs BFS</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter9/image116.webp</image:loc>
      <image:caption>Politeness. The general idea of enforcing politeness is to download one page at a time from the same host. Queue router: It ensures that each queue (b1, b2, … bn) only contains URLs from the same host</image:caption>
      <image:title>Politeness</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter9/image117.webp</image:loc>
      <image:caption>Politeness. Mapping table: It maps each host to a queue. FIFO queues b1, b2 to bn: Each queue contains URLs from the same host</image:caption>
      <image:title>Politeness</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter9/image118.webp</image:loc>
      <image:caption>Priority. shows the design that manages URL priority. Prioritizer: It takes URLs as input and computes the priorities</image:caption>
      <image:title>Priority</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter9/image119.webp</image:loc>
      <image:caption>Back queues: manage politeness. presents the URL frontier design, and it contains two modules</image:caption>
      <image:title>Back queues: manage politeness</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter9/image120.webp</image:loc>
      <image:caption>Distributed crawl. To achieve high performance, crawl jobs are distributed into multiple servers, and each server runs multiple threads.</image:caption>
      <image:title>Distributed crawl</image:title>
    </image:image>
    <image:image>
      <image:loc>https://whizan.xyz/books/systemdesign/chapter9/image121.webp</image:loc>
      <image:caption>Extensibility. As almost every system evolves, one of the design goals is to make the system flexible enough to support new content types. The crawler can be extended by plugging in new modules.shows how to add new modules. PNG Downloader module is plugged-in to download PNG files</image:caption>
      <image:title>Extensibility</image:title>
    </image:image>
  </url>
</urlset>