Post

🔗 Chain Rule, Implicit Differentiation, and Partial Derivatives (Calculus for ML)

🔗 Chain Rule, Implicit Differentiation, and Partial Derivatives (Calculus for ML)

Before diving into neural networks or complex optimization, it’s critical to master three powerful tools in calculus: the Chain Rule, Implicit Differentiation, and Partial Derivatives.

In this post, you’ll learn:

  • How to apply the chain rule to composite functions
  • How implicit differentiation builds on the chain rule
  • How to differentiate multivariable functions with respect to one variable (partial differentiation)


🔗 What is the Chain Rule?

The chain rule allows us to differentiate composite functions, i.e., functions inside other functions.

Let’s say:

\[ f(x) = h(p(x)) = \sin(x^2) \]

We treat the outer function \( h(u) = \sin(u) \), and the inner \( p(x) = x^2 \).

Using the chain rule:

\[ f’(x) = h’(p(x)) \cdot p’(x) = \cos(x^2) \cdot 2x \]

✅ So, the derivative of \( \sin(x^2) \) is \( 2x \cdot \cos(x^2) \)


Plot showing sin(x²) and its derivative 2x·cos(x²) — a visual demonstration of the Chain Rule in action.


🧠 Another Chain Rule Example (Step-by-Step)

Let’s find the derivative of:

\[ f(x) = \ln(3x^2 + 1) \]

This function has a function inside a function, which is exactly when we use the chain rule.


🔹 Step 1: Identify the Inner and Outer Functions

We rewrite the function as:

\[ f(x) = h(p(x)) \]

Where:

  • The inner function is: \( p(x) = 3x^2 + 1 \)
  • The outer function is: \( h(u) = \ln(u) \), with \( u = p(x) \)

🔹 Step 2: Differentiate Each Part

  • \( h’(u) = \frac{1}{u} \) → this is the derivative of \( \ln(u) \)
  • \( p’(x) = \frac{d}{dx}(3x^2 + 1) = 6x \)

Now apply the chain rule:

\[ f’(x) = h’(p(x)) \cdot p’(x) = \frac{1}{3x^2 + 1} \cdot 6x \]


✅ Final Answer

\[ f’(x) = \frac{6x}{3x^2 + 1} \]


💡 Why This Matters

When differentiating logarithmic, trigonometric, or exponential functions wrapped around polynomials, the chain rule is your go-to tool. You treat the “outside” and “inside” layers separately, then multiply the results.


🔁 Implicit Differentiation

Sometimes, functions are not written in the form \( y = f(x) \). Instead, x and y are mixed together in one equation. In that case, we can’t isolate y easily — so we use implicit differentiation.


📘 Example: A Circle

Take the equation of a circle:

\[ x^2 + y^2 = 25 \]

This defines a relationship between x and y, but y is not explicitly solved.


🧠 Step 1: Differentiate both sides with respect to x

Apply \( \frac{d}{dx} \) to each term:

\[ \frac{d}{dx}(x^2) + \frac{d}{dx}(y^2) = \frac{d}{dx}(25) \]

  • \( \frac{d}{dx}(x^2) = 2x \)
  • \( \frac{d}{dx}(25) = 0 \) (constants have zero slope)
  • \( \frac{d}{dx}(y^2) = 2y \cdot \frac{dy}{dx} \) ← this uses the chain rule, because y is treated as a function of x

So the full result becomes:

\[ 2x + 2y \cdot \frac{dy}{dx} = 0 \]


✍️ Step 2: Solve for \( \frac{dy}{dx} \)

Subtract \( 2x \) from both sides:

\[ 2y \cdot \frac{dy}{dx} = -2x \]

Divide both sides by \( 2y \):

\[ \frac{dy}{dx} = \frac{-x}{y} \]


💡 Interpretation:

Even though we didn’t solve for y directly, we found the slope of the curve at any point (x, y) on the circle. This is powerful — we differentiated without rearranging!


🔗 This method is essential when:

  • y appears in powers or multiplied with x
  • You have to find \( \frac{dy}{dx} \) but can’t isolate y
  • You’re working with geometric shapes, constraint equations, or system-level models in ML

A plot of the circle x² + y² = 25, showing the implicit relationship between x and y.


🔀 Partial Derivatives

When a function depends on more than one variable (e.g., \( f(x, y, z) \)), we can differentiate with respect to one, treating the others as constants.

Let:

\[ f(x, y, z) = \sin(x) \cdot e^{yz^2} \]

🔹 Differentiate with respect to \( x \):

Only \( \sin(x) \) is affected:

\[ \frac{\partial f}{\partial x} = \cos(x) \cdot e^{yz^2} \]

✅ Here, \( y \) and \( z \) are treated as constants.


3D surface plot of f(x, y) = sin(x) · eʸ showing how the function changes with x and y — useful for partial derivatives.


🔹 Differentiate with respect to ( y ):

\[ \frac{\partial f}{\partial y} = \sin(x) \cdot e^{yz^2} \cdot z^2 \]

Chain rule applied to the exponent!


🔹 Differentiate with respect to \( z \):

\[ \frac{\partial f}{\partial z} = \sin(x) \cdot e^{yz^2} \cdot 2yz \]


🤖 Relevance to Machine Learning

Understanding these differentiation tools is essential for anyone working in machine learning:

  • 🧮 Chain Rule powers backpropagation in neural networks. Each layer’s gradient is computed by chaining derivatives through the layers — exactly as taught by the chain rule.
  • ⚙️ Partial Derivatives form the backbone of gradient-based optimization. In functions with multiple variables (e.g., model weights), we compute partials to know how each parameter affects the loss.
  • Implicit Differentiation appears in constrained optimization and in tools like Lagrange Multipliers, often used when variables can’t be expressed directly.
  • 📉 These techniques together allow models to learn, update weights, and minimize error during training.

In short: these aren’t just abstract math tricks — they’re the mathematical gears that drive intelligent systems.


🚀 Level Up
  • 💡 The Chain Rule extends naturally to multiple nested layers — this is the foundation of backpropagation in neural networks.
  • 🔁 Implicit Differentiation is especially useful in constraint optimization problems where you can’t isolate a variable.
  • 🌐 Partial Derivatives are essential for working with functions of many variables — you'll see them in gradient vectors, Jacobians, and Hessians.
  • 📊 In machine learning, partial derivatives guide how each weight is updated during training.
  • 🧠 Want to go deeper? Explore the Total Derivative and Directional Derivatives for full control over multivariate calculus.

✅ Best Practices
  • 🔍 Break down complex expressions into inner and outer layers to apply the chain rule step by step.
  • 🧠 Label your inner and outer functions clearly — this reduces mistakes and makes your process transparent.
  • 📐 Use implicit differentiation when a function defines x and y together — especially useful for constraint problems.
  • 🧮 In partial differentiation, freeze all variables you're not differentiating with respect to — treat them like constants.
  • 📊 Simplify before and after differentiating — it helps reduce errors and makes the final expression cleaner.
  • 📈 Check your results visually when possible using a graphing tool to confirm slope behavior matches intuition.

⚠️ Common Pitfalls
  • Forgetting to multiply by the derivative of the inner function when using the chain rule — the #1 mistake!
  • Applying the power rule directly to composite expressions without unpacking layers first.
  • Misusing implicit differentiation by forgetting to apply the chain rule to y terms.
  • In partial differentiation, accidentally differentiating with respect to more than one variable at once.
  • Confusing constant functions with constant coefficients — constants like 7 are different from terms like 7x.

📌 Try It Yourself

Test your understanding with these practice problems. Each one focuses on a key technique: chain rule, implicit differentiation, or partial derivatives.


📊 Chain Rule: What is \( \frac{d}{dx} \sin(5x^2) \) ? This is a composite function: \( \sin(u) \) where \( u = 5x^2 \)
🧠 Step-by-step: - Outer: \( \frac{d}{du} \sin(u) = \cos(u) \) - Inner: \( \frac{d}{dx}(5x^2) = 10x \) ### ✅ Final Answer: \[ \frac{d}{dx} \\sin(5x^2) = \\cos(5x^2) \\cdot 10x \]

📊 Implicit Differentiation: Given \( x^2 + xy + y^2 = 7 \), find \( \frac{dy}{dx} \) Differentiate both sides with respect to \( x \):
🧠 Step-by-step: - \( \frac{d}{dx}(x^2) = 2x \) - \( \frac{d}{dx}(xy) = x \cdot \frac{dy}{dx} + y \) ← product rule! - \( \frac{d}{dx}(y^2) = 2y \cdot \frac{dy}{dx} \) - \( \frac{d}{dx}(7) = 0 \) Putting it together: \[ 2x + x \cdot \frac{dy}{dx} + y + 2y \cdot \frac{dy}{dx} = 0 \] Group \( \frac{dy}{dx} \) terms: \[ (x + 2y)\frac{dy}{dx} = -(2x + y) \] ### ✅ Final Answer: \[ \frac{dy}{dx} = \frac{-(2x + y)}{x + 2y} \]

🔁 Summary: What You Learned

🧠 Concept📌 Description
Chain RuleDifferentiates composite functions by chaining outer and inner derivatives.
Implicit DifferentiationDifferentiates equations where y is not isolated, using the chain rule on y terms.
Partial DerivativeDifferentiates multivariable functions with respect to one variable at a time.
BackpropagationUses chain rule to compute gradients layer by layer in neural networks.
Gradient VectorA vector of partial derivatives used in optimization and ML training.
Derivative of sin(x²)\( \frac{d}{dx}\sin(x^2) = 2x \cdot \cos(x^2) \)
∂f/∂x of f(x, y, z)\( \frac{\partial}{\partial x}(\sin(x)e^{yz^2}) = \cos(x)e^{yz^2} \)

💬 Got a question or suggestion?

Leave a comment below — I’d love to hear your thoughts or help if something was unclear.


🧭 Next Up

Now that you’ve learned how to compute partial derivatives, you’re ready to tackle the next building block in multivariable calculus: the Jacobian Matrix.

In the next post, we’ll explore:

  • What the Jacobian is and how it’s constructed
  • Why it’s essential for transformations, coordinate changes, and machine learning gradients
  • Step-by-step examples for both scalar and vector-valued functions

Stay curious — you’re one layer closer to mastering the math behind modern ML models.

This post is licensed under CC BY 4.0 by the author.