Post

๐Ÿ”— Chain Rule, Implicit Differentiation, and Partial Derivatives (Calculus for ML)

๐Ÿ”— Chain Rule, Implicit Differentiation, and Partial Derivatives (Calculus for ML)

Before diving into neural networks or complex optimization, itโ€™s critical to master three powerful tools in calculus: the Chain Rule, Implicit Differentiation, and Partial Derivatives.

In this post, youโ€™ll learn:

  • How to apply the chain rule to composite functions
  • How implicit differentiation builds on the chain rule
  • How to differentiate multivariable functions with respect to one variable (partial differentiation)


๐Ÿ”— What is the Chain Rule?

The chain rule allows us to differentiate composite functions, i.e., functions inside other functions.

Letโ€™s say:

\[ f(x) = h(p(x)) = \sin(x^2) \]

We treat the outer function \( h(u) = \sin(u) \), and the inner \( p(x) = x^2 \).

Using the chain rule:

\[ fโ€™(x) = hโ€™(p(x)) \cdot pโ€™(x) = \cos(x^2) \cdot 2x \]

โœ… So, the derivative of \( \sin(x^2) \) is \( 2x \cdot \cos(x^2) \)


๐Ÿงฎ Python: Symbolic Chain Rule

1
2
3
4
5
6
import sympy as sp

x = sp.symbols('x')
f = sp.sin(x**2)
dfdx = sp.diff(f, x)
dfdx = 2*x*cos(x**2)

Plot showing sin(xยฒ) and its derivative 2xยทcos(xยฒ) โ€” a visual demonstration of the Chain Rule in action.


๐Ÿง  Another Chain Rule Example (Step-by-Step)

Letโ€™s find the derivative of:

\[ f(x) = \ln(3x^2 + 1) \]

This function has a function inside a function, which is exactly when we use the chain rule.


๐Ÿ”น Step 1: Identify the Inner and Outer Functions

We rewrite the function as:

\[ f(x) = h(p(x)) \]

Where:

  • The inner function is: \( p(x) = 3x^2 + 1 \)
  • The outer function is: \( h(u) = \ln(u) \), with \( u = p(x) \)

๐Ÿ”น Step 2: Differentiate Each Part

  • \( hโ€™(u) = \frac{1}{u} \) โ†’ this is the derivative of \( \ln(u) \)
  • \( pโ€™(x) = \frac{d}{dx}(3x^2 + 1) = 6x \)

Now apply the chain rule:

\[ fโ€™(x) = hโ€™(p(x)) \cdot pโ€™(x) = \frac{1}{3x^2 + 1} \cdot 6x \]


โœ… Final Answer

\[ fโ€™(x) = \frac{6x}{3x^2 + 1} \]


๐Ÿ’ก Why This Matters

When differentiating logarithmic, trigonometric, or exponential functions wrapped around polynomials, the chain rule is your go-to tool. You treat the โ€œoutsideโ€ and โ€œinsideโ€ layers separately, then multiply the results.


๐Ÿ” Implicit Differentiation

Sometimes, functions are not written in the form \( y = f(x) \). Instead, x and y are mixed together in one equation. In that case, we canโ€™t isolate y easily โ€” so we use implicit differentiation.


๐Ÿ“˜ Example: A Circle

Take the equation of a circle:

\[ x^2 + y^2 = 25 \]

This defines a relationship between x and y, but y is not explicitly solved.


๐Ÿง  Step 1: Differentiate both sides with respect to x

Apply \( \frac{d}{dx} \) to each term:

\[ \frac{d}{dx}(x^2) + \frac{d}{dx}(y^2) = \frac{d}{dx}(25) \]

  • \( \frac{d}{dx}(x^2) = 2x \)
  • \( \frac{d}{dx}(25) = 0 \) (constants have zero slope)
  • \( \frac{d}{dx}(y^2) = 2y \cdot \frac{dy}{dx} \) โ† this uses the chain rule, because y is treated as a function of x

So the full result becomes:

\[ 2x + 2y \cdot \frac{dy}{dx} = 0 \]


โœ๏ธ Step 2: Solve for \( \frac{dy}{dx} \)

Subtract \( 2x \) from both sides:

\[ 2y \cdot \frac{dy}{dx} = -2x \]

Divide both sides by \( 2y \):

\[ \frac{dy}{dx} = \frac{-x}{y} \]


๐Ÿ’ก Interpretation:

Even though we didnโ€™t solve for y directly, we found the slope of the curve at any point (x, y) on the circle. This is powerful โ€” we differentiated without rearranging!


๐Ÿ”— This method is essential when:

  • y appears in powers or multiplied with x
  • You have to find \( \frac{dy}{dx} \) but canโ€™t isolate y
  • Youโ€™re working with geometric shapes, constraint equations, or system-level models in ML

A plot of the circle xยฒ + yยฒ = 25, showing the implicit relationship between x and y.


๐Ÿ”€ Partial Derivatives

When a function depends on more than one variable (e.g., \( f(x, y, z) \)), we can differentiate with respect to one, treating the others as constants.

Let:

\[ f(x, y, z) = \sin(x) \cdot e^{yz^2} \]

๐Ÿ”น Differentiate with respect to \( x \):

Only \( \sin(x) \) is affected:

\[ \frac{\partial f}{\partial x} = \cos(x) \cdot e^{yz^2} \]

โœ… Here, \( y \) and \( z \) are treated as constants.

๐Ÿ”น Differentiate with respect to ( y ):

\[ \frac{\partial f}{\partial y} = \sin(x) \cdot e^{yz^2} \cdot z^2 \]

Chain rule applied to the exponent!


๐Ÿ”น Differentiate with respect to \( z \):

\[ \frac{\partial f}{\partial z} = \sin(x) \cdot e^{yz^2} \cdot 2yz \]


๐Ÿงฎ Python: Symbolic Partial Derivatives

1
2
3
4
5
6
7
8
9
10
11
import sympy as sp

x, y, z = sp.symbols('x y z')
f = sp.sin(x) * sp.exp(y * z**2)

df_dx = sp.diff(f, x)
df_dy = sp.diff(f, y)
df_dz = sp.diff(f, z)

df_dx, df_dy, df_dz

Output:

1
2
3
4
(exp(y*z**2)*cos(x),
 z**2*exp(y*z**2)*sin(x),
 2*y*z*exp(y*z**2)*sin(x))

This confirms our analytical work.

โœ… 3. Visualizing f(x, y) = sin(x) ยท e^y in 3D

3D surface plot of f(x, y) = sin(x) ยท eสธ ...`


๐Ÿ“Š Python: 3D Surface Plot

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
import numpy as np
import matplotlib.pyplot as plt

X, Y = np.meshgrid(np.linspace(-2*np.pi, 2*np.pi, 100), np.linspace(-2, 2, 100))
Z = np.sin(X) * np.exp(Y)

fig = plt.figure(figsize=(8, 6))
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(X, Y, Z, cmap='viridis')
ax.set_title(r'$f(x, y) = \sin(x) \cdot e^y$')
ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('f(x, y)')
plt.show()

๐Ÿค– Relevance to Machine Learning

Understanding these differentiation tools is essential for anyone working in machine learning:

  • ๐Ÿงฎ Chain Rule powers backpropagation in neural networks. Each layerโ€™s gradient is computed by chaining derivatives through the layers โ€” exactly as taught by the chain rule.
  • โš™๏ธ Partial Derivatives form the backbone of gradient-based optimization. In functions with multiple variables (e.g., model weights), we compute partials to know how each parameter affects the loss.
  • โ“ Implicit Differentiation appears in constrained optimization and in tools like Lagrange Multipliers, often used when variables canโ€™t be expressed directly.
  • ๐Ÿ“‰ These techniques together allow models to learn, update weights, and minimize error during training.

In short: these arenโ€™t just abstract math tricks โ€” theyโ€™re the mathematical gears that drive intelligent systems.


๐Ÿš€ Level Up
  • ๐Ÿ’ก The Chain Rule extends naturally to multiple nested layers โ€” this is the foundation of backpropagation in neural networks.
  • ๐Ÿ” Implicit Differentiation is especially useful in constraint optimization problems where you canโ€™t isolate a variable.
  • ๐ŸŒ Partial Derivatives are essential for working with functions of many variables โ€” you'll see them in gradient vectors, Jacobians, and Hessians.
  • ๐Ÿ“Š In machine learning, partial derivatives guide how each weight is updated during training.
  • ๐Ÿง  Want to go deeper? Explore the Total Derivative and Directional Derivatives for full control over multivariate calculus.

โœ… Best Practices
  • ๐Ÿ” Break down complex expressions into inner and outer layers to apply the chain rule step by step.
  • ๐Ÿง  Label your inner and outer functions clearly โ€” this reduces mistakes and makes your process transparent.
  • ๐Ÿ“ Use implicit differentiation when a function defines x and y together โ€” especially useful for constraint problems.
  • ๐Ÿงฎ In partial differentiation, freeze all variables you're not differentiating with respect to โ€” treat them like constants.
  • ๐Ÿ“Š Simplify before and after differentiating โ€” it helps reduce errors and makes the final expression cleaner.
  • ๐Ÿ“ˆ Check your results visually when possible using a graphing tool to confirm slope behavior matches intuition.

โš ๏ธ Common Pitfalls
  • โŒ Forgetting to multiply by the derivative of the inner function when using the chain rule โ€” the #1 mistake!
  • โŒ Applying the power rule directly to composite expressions without unpacking layers first.
  • โŒ Misusing implicit differentiation by forgetting to apply the chain rule to y terms.
  • โŒ In partial differentiation, accidentally differentiating with respect to more than one variable at once.
  • โŒ Confusing constant functions with constant coefficients โ€” constants like 7 are different from terms like 7x.

๐Ÿ“Œ Try It Yourself

Test your understanding with these practice problems. Each one focuses on a key technique: chain rule, implicit differentiation, or partial derivatives.


๐Ÿ“Š Chain Rule: What is \( \frac{d}{dx} \sin(5x^2) \) ? This is a composite function: \( \sin(u) \) where \( u = 5x^2 \)
๐Ÿง  Step-by-step: - Outer: \( \frac{d}{du} \sin(u) = \cos(u) \) - Inner: \( \frac{d}{dx}(5x^2) = 10x \) ### โœ… Final Answer: \[ \frac{d}{dx} \\sin(5x^2) = \\cos(5x^2) \\cdot 10x \]

๐Ÿ“Š Implicit Differentiation: Given \( x^2 + xy + y^2 = 7 \), find \( \frac{dy}{dx} \) Differentiate both sides with respect to \( x \):
๐Ÿง  Step-by-step: - \( \frac{d}{dx}(x^2) = 2x \) - \( \frac{d}{dx}(xy) = x \cdot \frac{dy}{dx} + y \) โ† product rule! - \( \frac{d}{dx}(y^2) = 2y \cdot \frac{dy}{dx} \) - \( \frac{d}{dx}(7) = 0 \) Putting it together: \[ 2x + x \cdot \frac{dy}{dx} + y + 2y \cdot \frac{dy}{dx} = 0 \] Group \( \frac{dy}{dx} \) terms: \[ (x + 2y)\frac{dy}{dx} = -(2x + y) \] ### โœ… Final Answer: \[ \frac{dy}{dx} = \frac{-(2x + y)}{x + 2y} \]

๐Ÿ” Summary: What You Learned

๐Ÿง  Concept๐Ÿ“Œ Description
Chain RuleDifferentiates composite functions by chaining outer and inner derivatives.
Implicit DifferentiationDifferentiates equations where y is not isolated, using the chain rule on y terms.
Partial DerivativeDifferentiates multivariable functions with respect to one variable at a time.
BackpropagationUses chain rule to compute gradients layer by layer in neural networks.
Gradient VectorA vector of partial derivatives used in optimization and ML training.
Derivative of sin(xยฒ)\( \frac{d}{dx}\sin(x^2) = 2x \cdot \cos(x^2) \)
โˆ‚f/โˆ‚x of f(x, y, z)\( \frac{\partial}{\partial x}(\sin(x)e^{yz^2}) = \cos(x)e^{yz^2} \)

๐Ÿ’ฌ Got a question or suggestion?

Leave a comment below โ€” Iโ€™d love to hear your thoughts or help if something was unclear.


๐Ÿงญ Next Up

Now that youโ€™ve learned how to compute partial derivatives, youโ€™re ready to tackle the next building block in multivariable calculus: the Jacobian Matrix.

In the next post, weโ€™ll explore:

  • What the Jacobian is and how itโ€™s constructed
  • Why itโ€™s essential for transformations, coordinate changes, and machine learning gradients
  • Step-by-step examples for both scalar and vector-valued functions

Stay curious โ€” youโ€™re one layer closer to mastering the math behind modern ML models.

This post is licensed under CC BY 4.0 by the author.