diff --git a/hfdocs/source/models/regnetx.mdx b/hfdocs/source/models/regnetx.mdx
index 6f8c4a68..1e9548bd 100644
--- a/hfdocs/source/models/regnetx.mdx
+++ b/hfdocs/source/models/regnetx.mdx
@@ -1,10 +1,10 @@
 # RegNetX
 
-**RegNetX** is a convolutional network design space with simple, regular models with parameters: depth \\( d \\), initial width \\( w\_{0} > 0 \\), and slope \\( w\_{a} > 0 \\), and generates a different block width \\( u\_{j} \\) for each block \\( j < d \\). The key restriction for the RegNet types of model is that there is a linear parameterisation of block widths (the design space only contains models with this linear structure):
+**RegNetX** is a convolutional network design space with simple, regular models with parameters: depth \\( d \\), initial width \\( w_{0} > 0 \\), and slope \\( w_{a} > 0 \\), and generates a different block width \\( u_{j} \\) for each block \\( j < d \\). The key restriction for the RegNet types of model is that there is a linear parameterisation of block widths (the design space only contains models with this linear structure):
 
-\\( \\) u\_{j} = w\_{0} + w\_{a}\cdot{j} \\( \\)
+\\( u_{j} = w_{0} + w_{a}\cdot{j} \\)
 
-For **RegNetX** we have additional restrictions: we set \\( b = 1 \\) (the bottleneck ratio), \\( 12 \leq d \leq 28 \\), and \\( w\_{m} \geq 2 \\) (the width multiplier).
+For **RegNetX** we have additional restrictions: we set \\( b = 1 \\) (the bottleneck ratio), \\( 12 \leq d \leq 28 \\), and \\( w_{m} \geq 2 \\) (the width multiplier).
 
 ## How do I use this model on an image?
 
diff --git a/hfdocs/source/models/regnety.mdx b/hfdocs/source/models/regnety.mdx
index e7f2e6c1..04d86928 100644
--- a/hfdocs/source/models/regnety.mdx
+++ b/hfdocs/source/models/regnety.mdx
@@ -1,10 +1,10 @@
 # RegNetY
 
-**RegNetY** is a convolutional network design space with simple, regular models with parameters: depth \\( d \\), initial width \\( w\_{0} > 0 \\), and slope \\( w\_{a} > 0 \\), and generates a different block width \\( u\_{j} \\) for each block \\( j < d \\). The key restriction for the RegNet types of model is that there is a linear parameterisation of block widths (the design space only contains models with this linear structure):
+**RegNetY** is a convolutional network design space with simple, regular models with parameters: depth \\( d \\), initial width \\( w_{0} > 0 \\), and slope \\( w_{a} > 0 \\), and generates a different block width \\( u_{j} \\) for each block \\( j < d \\). The key restriction for the RegNet types of model is that there is a linear parameterisation of block widths (the design space only contains models with this linear structure):
 
-\\( \\) u\_{j} = w\_{0} + w\_{a}\cdot{j} \\( \\)
+\\( u_{j} = w_{0} + w_{a}\cdot{j} \\)
 
-For **RegNetX** authors have additional restrictions: we set \\( b = 1 \\) (the bottleneck ratio), \\( 12 \leq d \leq 28 \\), and \\( w\_{m} \geq 2 \\) (the width multiplier).
+For **RegNetX** authors have additional restrictions: we set \\( b = 1 \\) (the bottleneck ratio), \\( 12 \leq d \leq 28 \\), and \\( w_{m} \geq 2 \\) (the width multiplier).
 
 For **RegNetY** authors make one change, which is to include [Squeeze-and-Excitation blocks](https://paperswithcode.com/method/squeeze-and-excitation-block).
 
diff --git a/hfdocs/source/models/resnest.mdx b/hfdocs/source/models/resnest.mdx
index c90dfd70..b6f73dcf 100644
--- a/hfdocs/source/models/resnest.mdx
+++ b/hfdocs/source/models/resnest.mdx
@@ -1,6 +1,6 @@
 # ResNeSt
 
-A **ResNeSt** is a variant on a [ResNet](https://paperswithcode.com/method/resnet), which instead stacks [Split-Attention blocks](https://paperswithcode.com/method/split-attention). The cardinal group representations are then concatenated along the channel dimension: \\( V = \text{Concat} \\){\\( V^{1},V^{2},\cdots{V}^{K} \\)}. As in standard residual blocks, the final output \\( Y \\) of otheur Split-Attention block is produced using a shortcut connection: \\( Y=V+X \\), if the input and output feature-map share the same shape.  For blocks with a stride, an appropriate transformation \\( \mathcal{T} \\) is applied to the shortcut connection to align the output shapes:  \\( Y=V+\mathcal{T}(X) \\). For example, \\( \mathcal{T} \\) can be strided convolution or combined convolution-with-pooling.
+A **ResNeSt** is a variant on a [ResNet](https://paperswithcode.com/method/resnet), which instead stacks [Split-Attention blocks](https://paperswithcode.com/method/split-attention). The cardinal group representations are then concatenated along the channel dimension: \\( V = \text{Concat} \{ V^{1},V^{2},\cdots,{V}^{K} \} \\). As in standard residual blocks, the final output \\( Y \\) of otheur Split-Attention block is produced using a shortcut connection: \\( Y=V+X \\), if the input and output feature-map share the same shape.  For blocks with a stride, an appropriate transformation \\( \mathcal{T} \\) is applied to the shortcut connection to align the output shapes:  \\( Y=V+\mathcal{T}(X) \\). For example, \\( \mathcal{T} \\) can be strided convolution or combined convolution-with-pooling.
 
 ## How do I use this model on an image?