🔗 Chain Rule, Implicit Differentiation, and Partial Derivatives (Calculus for ML)

Posted Jul 1, 2025 Updated Jul 11, 2025

By Hoda Osama

8 min read

Before diving into neural networks or complex optimization, it’s critical to master three powerful tools in calculus: the Chain Rule, Implicit Differentiation, and Partial Derivatives.

In this post, you’ll learn:

How to apply the chain rule to composite functions
How implicit differentiation builds on the chain rule
How to differentiate multivariable functions with respect to one variable (partial differentiation)

📚 This post is part of the "Intro to Calculus" series

🔙 Previously: What is a Derivative? (Beginner’s Guide to Calculus for ML)

🔜 Next: Understanding the Jacobian – A Beginner’s Guide with 2D & 3D Examples

🔗 What is the Chain Rule?

The chain rule allows us to differentiate composite functions, i.e., functions inside other functions.

Let’s say:

\[ f(x) = h(p(x)) = \sin(x^2) \]

We treat the outer function \( h(u) = \sin(u) \), and the inner \( p(x) = x^2 \).

Using the chain rule:

\[ f’(x) = h’(p(x)) \cdot p’(x) = \cos(x^2) \cdot 2x \]

✅ So, the derivative of \( \sin(x^2) \) is \( 2x \cdot \cos(x^2) \)

🧠 Another Chain Rule Example (Step-by-Step)

Let’s find the derivative of:

\[ f(x) = \ln(3x^2 + 1) \]

This function has a function inside a function, which is exactly when we use the chain rule.

🔹 Step 1: Identify the Inner and Outer Functions

We rewrite the function as:

\[ f(x) = h(p(x)) \]

Where:

The inner function is: \( p(x) = 3x^2 + 1 \)
The outer function is: \( h(u) = \ln(u) \), with \( u = p(x) \)

🔹 Step 2: Differentiate Each Part

\( h’(u) = \frac{1}{u} \) → this is the derivative of \( \ln(u) \)
\( p’(x) = \frac{d}{dx}(3x^2 + 1) = 6x \)

Now apply the chain rule:

\[ f’(x) = h’(p(x)) \cdot p’(x) = \frac{1}{3x^2 + 1} \cdot 6x \]

✅ Final Answer

\[ f’(x) = \frac{6x}{3x^2 + 1} \]

💡 Why This Matters

When differentiating logarithmic, trigonometric, or exponential functions wrapped around polynomials, the chain rule is your go-to tool. You treat the “outside” and “inside” layers separately, then multiply the results.

🔁 Implicit Differentiation

Sometimes, functions are not written in the form \( y = f(x) \). Instead, x and y are mixed together in one equation. In that case, we can’t isolate y easily — so we use implicit differentiation.

📘 Example: A Circle

Take the equation of a circle:

\[ x^2 + y^2 = 25 \]

This defines a relationship between x and y, but y is not explicitly solved.

🧠 Step 1: Differentiate both sides with respect to x

Apply \( \frac{d}{dx} \) to each term:

\[ \frac{d}{dx}(x^2) + \frac{d}{dx}(y^2) = \frac{d}{dx}(25) \]

\( \frac{d}{dx}(x^2) = 2x \)
\( \frac{d}{dx}(25) = 0 \) (constants have zero slope)
\( \frac{d}{dx}(y^2) = 2y \cdot \frac{dy}{dx} \) ← this uses the chain rule, because y is treated as a function of x

So the full result becomes:

\[ 2x + 2y \cdot \frac{dy}{dx} = 0 \]

✍️ Step 2: Solve for \( \frac{dy}{dx} \)

Subtract \( 2x \) from both sides:

\[ 2y \cdot \frac{dy}{dx} = -2x \]

Divide both sides by \( 2y \):

\[ \frac{dy}{dx} = \frac{-x}{y} \]

💡 Interpretation:

Even though we didn’t solve for y directly, we found the slope of the curve at any point (x, y) on the circle. This is powerful — we differentiated without rearranging!

🔗 This method is essential when:

y appears in powers or multiplied with x
You have to find \( \frac{dy}{dx} \) but can’t isolate y
You’re working with geometric shapes, constraint equations, or system-level models in ML

🔀 Partial Derivatives

When a function depends on more than one variable (e.g., \( f(x, y, z) \)), we can differentiate with respect to one, treating the others as constants.

Let:

\[ f(x, y, z) = \sin(x) \cdot e^{yz^2} \]

🔹 Differentiate with respect to \( x \):

Only \( \sin(x) \) is affected:

\[ \frac{\partial f}{\partial x} = \cos(x) \cdot e^{yz^2} \]

✅ Here, \( y \) and \( z \) are treated as constants.

🔹 Differentiate with respect to ( y ):

\[ \frac{\partial f}{\partial y} = \sin(x) \cdot e^{yz^2} \cdot z^2 \]

Chain rule applied to the exponent!

🔹 Differentiate with respect to \( z \):

\[ \frac{\partial f}{\partial z} = \sin(x) \cdot e^{yz^2} \cdot 2yz \]

🤖 Relevance to Machine Learning

Understanding these differentiation tools is essential for anyone working in machine learning:

🧮 Chain Rule powers backpropagation in neural networks. Each layer’s gradient is computed by chaining derivatives through the layers — exactly as taught by the chain rule.
⚙️ Partial Derivatives form the backbone of gradient-based optimization. In functions with multiple variables (e.g., model weights), we compute partials to know how each parameter affects the loss.
❓ Implicit Differentiation appears in constrained optimization and in tools like Lagrange Multipliers, often used when variables can’t be expressed directly.
📉 These techniques together allow models to learn, update weights, and minimize error during training.

In short: these aren’t just abstract math tricks — they’re the mathematical gears that drive intelligent systems.

🚀 Level Up

💡 The Chain Rule extends naturally to multiple nested layers — this is the foundation of backpropagation in neural networks.
🔁 Implicit Differentiation is especially useful in constraint optimization problems where you can’t isolate a variable.
🌐 Partial Derivatives are essential for working with functions of many variables — you'll see them in gradient vectors, Jacobians, and Hessians.
📊 In machine learning, partial derivatives guide how each weight is updated during training.
🧠 Want to go deeper? Explore the Total Derivative and Directional Derivatives for full control over multivariate calculus.

✅ Best Practices

🔍 Break down complex expressions into inner and outer layers to apply the chain rule step by step.
🧠 Label your inner and outer functions clearly — this reduces mistakes and makes your process transparent.
📐 Use implicit differentiation when a function defines x and y together — especially useful for constraint problems.
🧮 In partial differentiation, freeze all variables you're not differentiating with respect to — treat them like constants.
📊 Simplify before and after differentiating — it helps reduce errors and makes the final expression cleaner.
📈 Check your results visually when possible using a graphing tool to confirm slope behavior matches intuition.

⚠️ Common Pitfalls

❌ Forgetting to multiply by the derivative of the inner function when using the chain rule — the #1 mistake!
❌ Applying the power rule directly to composite expressions without unpacking layers first.
❌ Misusing implicit differentiation by forgetting to apply the chain rule to y terms.
❌ In partial differentiation, accidentally differentiating with respect to more than one variable at once.
❌ Confusing constant functions with constant coefficients — constants like 7 are different from terms like 7x.

📌 Try It Yourself

Test your understanding with these practice problems. Each one focuses on a key technique: chain rule, implicit differentiation, or partial derivatives.

📊 Chain Rule: What is \( \frac{d}{dx} \sin(5x^2) \) ?

This is a composite function: \( \sin(u) \) where \( u = 5x^2 \)
🧠 Step-by-step: - Outer: \( \frac{d}{du} \sin(u) = \cos(u) \) - Inner: \( \frac{d}{dx}(5x^2) = 10x \) ### ✅ Final Answer: \[ \frac{d}{dx} \\sin(5x^2) = \\cos(5x^2) \\cdot 10x \]

📊 Implicit Differentiation: Given \( x^2 + xy + y^2 = 7 \), find \( \frac{dy}{dx} \)

Differentiate both sides with respect to \( x \):
🧠 Step-by-step: - \( \frac{d}{dx}(x^2) = 2x \) - \( \frac{d}{dx}(xy) = x \cdot \frac{dy}{dx} + y \) ← product rule! - \( \frac{d}{dx}(y^2) = 2y \cdot \frac{dy}{dx} \) - \( \frac{d}{dx}(7) = 0 \) Putting it together: \[ 2x + x \cdot \frac{dy}{dx} + y + 2y \cdot \frac{dy}{dx} = 0 \] Group \( \frac{dy}{dx} \) terms: \[ (x + 2y)\frac{dy}{dx} = -(2x + y) \] ### ✅ Final Answer: \[ \frac{dy}{dx} = \frac{-(2x + y)}{x + 2y} \]

🔁 Summary: What You Learned

🧠 Concept	📌 Description
Chain Rule	Differentiates composite functions by chaining outer and inner derivatives.
Implicit Differentiation	Differentiates equations where y is not isolated, using the chain rule on y terms.
Partial Derivative	Differentiates multivariable functions with respect to one variable at a time.
Backpropagation	Uses chain rule to compute gradients layer by layer in neural networks.
Gradient Vector	A vector of partial derivatives used in optimization and ML training.
Derivative of sin(x²)	\( \frac{d}{dx}\sin(x^2) = 2x \cdot \cos(x^2) \)
∂f/∂x of f(x, y, z)	\( \frac{\partial}{\partial x}(\sin(x)e^{yz^2}) = \cos(x)e^{yz^2} \)

💬 Got a question or suggestion?

Leave a comment below — I’d love to hear your thoughts or help if something was unclear.

🧭 Next Up

Now that you’ve learned how to compute partial derivatives, you’re ready to tackle the next building block in multivariable calculus: the Jacobian Matrix.

In the next post, we’ll explore:

What the Jacobian is and how it’s constructed
Why it’s essential for transformations, coordinate changes, and machine learning gradients
Step-by-step examples for both scalar and vector-valued functions

Stay curious — you’re one layer closer to mastering the math behind modern ML models.

calculus, intermediate

This post is licensed under CC BY 4.0 by the author.