The post Splines and Bézier Curves and their application in Video Games appeared first on GameLudere.

]]>**Definition 1.1**

A curve is a function

from an interval \((a,b)\) of the real line to the space \(\mathbb{R}^ {3}\). The function \(p(t)\) is a vector function with components \((x (t), y (t), z (t))\). The curve is said to be **differentiable** at a point if the three component functions can be differentiated at the point. A differentiable curve is said to be **regular** in an interval if the tangent vector exists at every point.

In physics a curve can represent the trajectory of a particle and the parameter \(t\) represents time. The vector obtained by calculating the derivatives of the three components is the tangent vector or velocity vector.

There are different types of representations of a curve.

**a) Explicit representation of a curve in the plane**

**Example 1.1**

**b) Implicit representation**

**Example 1.2**

**c) Parametric representation**

In this type, a curve is represented by an independent variable t, called a parameter.

**Example 1.3 – Circle**

**Example 1.4 – Helix**

The common representation for curves is the composite parametric form, consisting of various sections of curves joined at the end points. A polynomial parametric curve of degree \(n\) is represented in the following form:

\[ p(t) = \sum_{k=0}^{n} a_{k}t^{k} \]where \(p(t)\) and \(a_{k}\) are vector quantities:

\[ p(t)= \begin{pmatrix} x(t) \\ y(t) \\ z(t) \\ \end{pmatrix} \] \[ a_{k}= \begin{pmatrix} a_{kx} \\ a_{ky} \\ a_{kz} \\ \end{pmatrix} \]**Example 2.1 – The parametric line**Given two points of the plane or space represented by the position vectors \(p_{1}\) and \(p_{2}\), the equation of the straight line passing through the two points is the following:

It is a vector equation, which can be projected in the three Cartesian axes getting a system of three scalar equations:

\[ \begin{array}{l} x(t) = x_{1} + (x_{2}-x_{1})t \\ y(t) = y_{1} + (y_{2}-y_{1})t \\ z(t) = z_{1} + (z_{2}-z_{1})t \\ \end{array} \]One of the main uses of curves is to represent the motion of objects in space. The types of motion that can be simulated depend on the choice of the degree of the polynomial. With first degree polynomials we can only represent rectilinear movements. With polynomials of degree \(2\) we have a broad spectrum of possible movements; however, these are always movements constrained to remain in a plane. For these reasons, third-order or degree parametric curves are mainly used in computer graphics, which make it possible to represent almost all the movements. Using curves of greater degree would give few advantages and would actually complicate the calculations and increase the costs and processing times.

A third-order parametric curve has the following equation:

It is a vector equation, which in matrix form can be written as follows:

\[ p(t)= \begin{pmatrix} a_{0x} & a_{1x} & a_{2x} & a_{3x} \\ a_{0y} & a_{1y} & a_{2y} & a_{3y} \\ a_{0z} & a_{1z} & a_{2z} & a_{3z} \\ \end{pmatrix} \cdot \begin{pmatrix} 1 \\ t \\ t^{2} \\ t^{3} \\ \end{pmatrix} \]Typically we use curves made up of multiple separate components. Each component curve is continuous everywhere in its domain, but it is also interesting to study continuity at the junction points. Two types of continuity are defined for parametric curves: **geometric continuity** and **parametric continuity.**

A junction point between two segments of a parametric curve is said to have parametric continuity \(C^{0}\) if the values of the two curve segments coincide at the point. A junction between two segments is said to have continuity \(C^{1}\) if the values of the two segments coincide and also the values of the first derivatives \(\dfrac{dx}{dt}, \dfrac{dy}{dt},\dfrac{dz}{dt}\) coincide. Parametric continuity \(C^{k}\) is defined in a similar way.

A junction point between two components is said to have geometric continuity \(G^{0}\) if the values of the two curve segments coincide at the point. A junction point is said to have a geometric continuity \(G^{1}\) if the values of the two segments coincide and the values of the first derivatives \(\dfrac{dx}{dt}, \dfrac{dy}{dt},\dfrac{dz}{dt}\) are proportional. The important thing is that the tangent vectors of both components have the same direction at the connection point. Geometric continuity of class \(G^{k}\) is defined in a similar way.

Of course, parametric continuity implies geometric continuity, but the converse is not true. To better understand the difference between these two definitions, it is useful to imagine the physical situation of a particle moving from one segment of the curve to another passing through the junction point. In the case of parametric continuity the transition occurs smoothly, without speed variation, neither in direction nor in modulus. In the case of geometric continuity, on the other hand, while the direction remains unchanged, there is an instantaneous change in the modulus of speed, and therefore an acceleration, in the passage between the two segments of curve.

Computer graphics and video games require modeling complex objects and simulating animation processes, starting from a finite set of control data. The fundamental mathematical tools for these problems are the interpolating or approximating curves (or surfaces). In the case of **interpolation **the curve passes exactly through the specified points (for example natural splines); in the case of **approximation** the curve passes only through the extreme points and uses the intermediate points as a guide for the optimal construction of the curve (for example Bézier curves).

- The interpolating curves pass through the control points; the approximating curves assume a shape which is a function of the points, but do not necessarily pass through them.
- The interpolating curves are very useful for specifying trajectories in animation processes, where you want the trajectory of an object to pass through points.
- The approximating curves are very useful in modeling processes.

We can formalize the problem of polynomial interpolation in these two different situations:

- given \(n+1\) points of the plane \((x_{0},y_{0}),(x_{1},y_{1}),\cdots, (x_{n},y_{n})\) find a polynomial \(p(x)\) of degree at most \(n\) which assumes predetermined values \(y_{i}\) in the points, that is \(p(x_{i})=y_{i},i=0,1,\cdots,n\);
- given a function \(f(x)\) continuous in an interval \([a,b]\), determine the polynomial with degree \(n\) which best approximates the function.

Polynomials have important properties that make them ideal tools for interpolation or approximation. They are continuous functions, easily differentiable and integrable, and can be easily calculated with simple programs. The set of polynomials of degree less than or equal to \(n\) has a vector space structure. For a review of the definition and properties of vector spaces see ^{[1]}.

Polynomials are able to approximate any continuous function with the desired level of accuracy; in fact, the famous **Weierstrass approximation theorem **substantially states that any continuous function in a closed and bounded interval can be approximated by polynomials with any desired precision level.

Suppose we have \(n+1\) points of the plane

\[ (x_{0},y_{0}),(x_{1},y_{1}), \cdots ,(x_{n},y_{n}) \]We want to determine a polynomial \(p(x)\) with minimum degree such that \(p(x_{i})=y_{i}\) with \(0 \le i \le n\). The solution proposed by Lagrange uses the following polynomials \(L_{n,k}(x)\):

\[ L_{n,k}(x) = \frac{\prod\limits_{j=0, j \neq k}^{n} (x-x_{j})}{\prod\limits_{j=0, j \neq k}^{n} (x_{k}-x_{j})} \]**Exercise 3.1**

Prove the following property:

Lagrange’s solution is contained in the following theorem:

**Theorem 3.2 – Lagrange**

There is only one polynomial of degree at most \(n\) which assumes values \(y_{i}\) in points \(x_{i}\). The polynomial is given by the following expression:

The error varies from point to point. It is zero at the interpolation points and it is non-zero at the other points. To get an estimate of the error, let’s remember the following formula:

**Theorem 3.3**

In Lagrange’s formula, for each \( x \in [a,b]\) there is a number \(\xi(x)\) that depends on \(x\) such that:

The following example obtained with the SageMath product illustrates the Lagrange interpolation for the function \(y=\cos(x)\), with \(n=3\), in the interval \([0,2\pi]\).

The purpose of the Hermite interpolation is to improve the Lagrange approximation, imposing the further condition that also the derivatives have the same value at the control points.

**Definition 3.1 **

Let \(f(x)\) be a function with continuous first derivative in the interval \([a,b]\), in symbols \( f \in C^{1}[a,b]\). Let \((x_{0},x_{1},\cdots,x_{n})\) be distinct points in the interval \([a,b]\). The Hermite interpolating polynomial \(p(x)\) has the following properties:

**Theorem 3.4 – Hermite**

Let \(f(x)\) be a function of class \(C^{1}[a,b]\) and \(x_{0},x_{1},\cdots,x_{n}\) be \(n+1\) distinct points in the interval. Then there exists a single interpolating polynomial of degree less than or equal to \(2n+1\), given by the following expression:

where \(H_{n,i}(x)\) and \(K_{n,i}(x)\) are polynomials connected to Lagrange polynomials \(L_{n,i}(x)\). The polynomial \(p(x)\) is called the **first order osculating polynomial**. We recall without proof the formulas for the polynomials \(H_{n,i}(x)\) and \(K_{n,i}(x)\):

We note that in the case of a single point \(x_{0}\) the osculating polynomial is reduced to the Taylor polynomial.

For a complete study of Lagrange and Hermite interpolation see ^{[2]} or ^{[3]}.

The set consisting of all polynomials with real coefficients of degree less than or equal to \(n\) has a vector space structure; the polynomials can be added together and multiplied by a scalar quantity.

A natural basis for this vector space is the set of polynomials \(\{1,x,x^{2},\cdots, x^{n}\}\). Any other polynomial of degree \(\le n\) can be expressed as a linear combination of these base polynomials. However, there are other possible bases; one of these is constituted by the Bernstein polynomials which we will now introduce.

**Definition 4.1****Bernstein polynomials** of degree \(n\) are defined as follows:

There are \(n+1\) Bernstein polynomials of degree \(n\). Bernstein polynomials of degree \(n\) form a base for the set of polynomials.

Bernstein polynomials of degree 1 are:

Bernstein polynomials of degree 2 are:

\[ \begin{array}{l} B_{2,0}(t) = (1-t)^{2} \\ B_{2,1}(t) = 2t(1-t) \\ B_{2,2}(t) = t^{2} \\ \end{array} \]From the definition, the following properties easily follow:

**Theorem 4.1**

**Exercise 4.1**

Prove the following formulas:

Bernstein’s polynomials can be used to prove the following famous and important Weierstrass (1815-1897) theorem:

**Theorem 4.2 – Weierstrass**Let us define a continuous function in the interval \([0,1]\). Then for every fixed \(\epsilon \gt 0\) there exists a polynomial \(p(x): [0,1] \to \mathbb{R}\) such that

We can formulate the theorem in an equivalent form. We define the Bernstein polynomial function:

\[ f_{n}(x) = \sum_{k=0}^{n} f\left(\frac{k}{n}\right) B_{n,k}(x) \]Then the sequence of polynomials \(f_{n}(x)\) converges uniformly to the function \(f(x)\) in its domain of definition \([0,1]\). For an in-depth analysis of Bernstein’s polynomials and the proof of Weierstrass’s theorem see for example ^{[4]}.

In many applications the objects to be represented cannot be precisely defined by simple curves, such as arcs, ellipses, etc. Modeling curves (or even surfaces) are often created from a set of representative points of the object. The points and directions of the tangents at each point are used to determine the equation of the polynomial that approximates the shape of the curve. The resulting curve passing through the points is called an **interpolating spline**.

Other types of splines do not pass through all points; some are used to control and model the shape of the curve in the vicinity of each point. The resulting curve is called the **approximating spline**.

The most used splines are those of the third degree.

Let \(n+1\) points or nodes \({x_{0},x_{1}, \cdots, x_{n}}\) be given, with \(x_{0} \lt x_{1} \lt \cdots \lt x_{n}\). A **cubic spline of degree 3 **is a composite function \(S_{3,n}(x)\) defined in the interval containing the points:

The function \(S_{3,n}(x)\) is a collection consisting of \(n\) polynomials \(p_{i}(x)\) of degree \(3\) at most.

Now suppose we have, for each point \(x_{i}\), a value \(f(x_{i})\). The spline \(S_{3,n}(x)\) is called an interpolating cubic spline with respect to the data pairs \((x_{i},f_{i})\) if it results:

Clearly, inside each interval \([x_{i-1},x_{i}]\) the polynomial \(p_{i}(x)\) has derivatives of all orders. The discontinuity points of the spline \(S_{3,n}(x)\) can only be in the nodes. In general, additional continuity conditions are imposed in the nodes:

\[ \begin{array}{l} p_{i-1}(x_{i})=p_{i}(x_{i}) \\ p_{i}'(x_{i}) = p_{i+1}'(x_{i}) \\ p_{i}”(x_{i}) = p_{i+1}”(x_{i}) \\ \end{array} \]Different types of spline curves can be defined by setting different boundary conditions in the nodes. For a systematic study of the various types of splines (natural, complete and others) and their properties see Quarteroni’s bibliography.

**Exercise 5.1**

Determine the natural first order spline to approximate the following data: \((0,0),(1,2),(2,1),(3,0)\).

Solution

The curve obtained with the SageMath tool is the following:

5.2) Cubic spline of Hermite

A cubic Hermite spline tries to control the shape of the curve by imposing conditions on the tangents at the extreme points.

**Definition 5.2**

Let \([a,b]\) be an interval and \(f: [a,b] \to\mathbf{R}\) a derivable function. Let \((x_{0}, \cdots, x_{n})\) be a partition. Hermite’s cubic spline is a finite set of polynomials \(p_{0}, \cdots,p_{n-1}\) satisfying the following relations:

for \(i=0,1, \cdots, n-1\).

Each component of the Hermite cubic spline has the following representation:

where \(p_{0},p_{1}\) are the starting and ending points, \(v_{0},v_{1}\) are the tangents at the points, and the polynomials \(H_{i}(t)\) have the following expression

\[ \begin{split} H_{0}(t)&= (1+2t)(1-t)^{2} \\ H_{1}(t)&= t(1-t)^{2} \\ H_{2}(t)&= t^{2}(t-1) \\ H_{3}(t)&= t^{2}(3-2t) \\ \end{split} \]We can give this matrix representation:

\[ p(t)= \left( \begin{array}{cccc} p_{0} &v_{0}& v_{1} & p_{1} \end{array} \right) \cdot \left( \begin{array}{cccc} 1 &0 & -3 &2 \\ 0 & 1 & -2 &1 \\ 0 &0 & -1 & 1 \\ 0 &0 & 3 & -2 \end{array} \right) \cdot \left( \begin{array}{c} 1 \\ t \\ t^{2} \\ t^{3} \end{array} \right) \\ \]or the following equivalent one:

\[ p(t) = \left( \begin{array}{cccc} t^{3} &t^{2} & t & 1 \end{array} \right) \cdot \left( \begin{array}{cccc} 2 & -2 & 1& 1 \\ -3 & 3 & -2 & -1 \\ 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 0 \end{array} \right) \cdot \left( \begin{array}{c} p_{0} \\ p_{1} \\ v_{0} \\ v_{1} \end{array} \right) \\ \]Unlike Bézier curves, the Hermite shape of a curve is not a weighted average, as the sum \(H_{0}+H_{1}+H_{2}+H_{3}\) is generally not equal to \(1\). The coefficients \(H_{0},H_{3}\) are the starting and ending points, while the coefficients \(H_{1},H_{2}\) are vectors.

The Hermite spline is simple to compute and allows you to create a smooth and symmetrical compound curve. To modify the curve, you can modify the position of a node or the direction of the tangent in the node. However it has the disadvantage of needing to know the derivatives at the nodes.

If the derivatives are not known, they can be approximated with finite differences. This is the approach proposed by E. Catmull and R. Rom in 1974 ^{[5]}.

One of the characteristics of the Catmull-Rom spline is that the curve passes through all control points, unlike other types of splines.

As we know, the spline is a collection of cubic curves connected to each other at the extreme points. The first curve goes from point \(p_{0}\) to point \(p_{1}\), the second from point \(p_{1}\) to point \(p_{2}\), etc. To have continuity in the connecting points of the curves, the two tangents must be equal. The standard Catmull-Rom splines procedure creates the tangent at a point using neighboring points. The formula for the tangent \(\mathbf{v}_{i}\) at point \(p_{i}\) is

The important fact is that to calculate the tangent at a point it is enough to know the position of the adjacent points.

The procedure for determining the Catmull-Rom spline is similar to that used for the Hermite spline; the matrix representation is:

Bézier curves represent a fundamental class of spline curves. The name derives from **Pierre Bézier **(1920-1999), who first published an article while working at the Renault car manufacturer as engineer and designer. Actually the same type of curves had already been studied by **Paul de Casteljau** while working at Citroën, but he did not publish his research.

**Definition 6.1**

Let \(n+1\) points \({p_{0},p_{1}, \cdots , p_{n}}\) be given. The Bézier curve is defined by the following equation

where \(B_{n,k}(t)\) are the Bernstein polynomials. The main properties of the Bézier curve are the following:

- the curve passes through the starting and ending points \(p_{0}\) e \(p_{1}\);
- the curve lies in the convex region delimited by the points, as \(B_{n,k}(t) \ge 0\), for each \(t \in [0,1]\);
- the curve is tangent at the extreme points to \(p_{1}-p_{0}\) and \(p_{n}-p_{n-1}\).

These points are called **control points**. The broken line formed by the segments that join them is called the **control polygon** . Only the first and last control points belong to the Bézier curve, while the rest control the shape of the curve in their proximity. It is therefore an approximating curve and not an interpolating curve.

The Bernstein polynomials serve as **blending functions** for the Bézier curves, that is, as weights associated with the control points. The order of the polynomial representing the curve is equal to the number of control points minus one.

A disadvantage of the Bézier curves is the lack of local control over the curve. If we modify a control point the whole curve is modified, not just the local part near the point.

The most commonly used Bézier curves are those of the third degree, defined by four control points. In this case the Bernstein polynomials are as follows:

\[ \begin{array}{l} B_{3,0}(t) =(1-t)^{3} \\ B_{3,1}(t) =3t(1-t)^{2} \\ B_{3,2}(t) =3t^{2}(1-t) \\ B_{3,3}(t) =t^{3} \\ \end{array} \]**Exercise 6.1**

Prove the following relations satisfied by the first derivative of the cubic Bézier curve:

The cubic Bézier curve (case \(n=3\)), with \(4\) control points \({p_{0},p_{1},p_{2},p_{3}}\), has the following vector expression:

\[ p(t) = (1-t)^{3}p_{0}+3t(1-t)^{2}p_{1}+3t^{2}(1-t)p_{2}+t^{3}p_{3} \]In matrix form it can be written as follows:

\[ p(u) = \left( \begin{array}{cccc} t^{3} &t^{2} & t & 1 \end{array} \right) \cdot \frac{1}{2} \cdot \left( \begin{array}{cccc} -1 & 3 & -3 & 1 \\ 3 & -6 & 3 & 0 \\ -3 & 3 & 0 & 0 \\ 1 & 0 & 0 & 0 \end{array} \right) \cdot \left( \begin{array}{c} p_{0} \\ p_{1} \\ p_{2} \\ p_{3} \end{array} \right) \\ \]**Exercise 6.2**

Draw the Bézier curve for the following \(4\) points:

Solution

The curve in the plane is represented by the following pair of parametric equations:

The Bézier spline plot is as follows:

The graph was created with the TikZ product in the Latex environment, using the following instruction:

```
<pre class="wp-block-code"><code>\draw[scale=1,domain=0:1,samples=100,variable=\t]
plot ({(1-\t)^3 +6*\t*(1-\t)^2 +12*\t^2*(1-\t)+3*\t^3},
{(1-\t)^3 +9*\t*(1-\t)^2 +9*\t^2*(1-\t)+\t^3});</code></pre>
```

This way of calculating the Bézier curve is good if you have few nodes. As the number of nodes increases, the degree of the polynomial increases and numerical instabilities and rounding errors occur.

A better way is to use de Casteljau’s algorithm.

In \(1959\) Paul de Casteljau (1910-1999) created a simple and efficient algorithm that constructs a Bézier curve through repeated linear interpolations.

The algorithm starts by setting a value of the parameter \(t \in(0,1)\). Between each pair of consecutive control points a linear interpolation is made according to the parameter \(t\), obtaining a new point. Starting from the initial \(4\) control points denoted with the notation \((p_{0}^{0},p_{1}^{0},p_{2}^{0},p_{3}^{0})\), we get \(3\) new points \((p_{0}^{1},p_{1}^{1},p_{2}^{1})\). We continue until we obtain a single point \(p_{0}^{3}\), which is the desired value of the polynomial in \(t\): \(p(t)\). In each step of the algorithm, a new point is created between each pair in the ratio \(t:(1-t)\).

We can summarize the algorithm with these recurrence equations:

To see de Casteljau’s algorithm in action see this link to Wikipedia.

Splines are used in various situations in video game programming:

- to control the movement of an NPC (Non-Player Character) character, so that its motion is as fluid and realistic as possible. This is guaranteed by the continuity of the first and second derivatives of the spline;
- to describe different paths;
- to model objects of various geometric shapes;
- in Unity animations, to define useful curves in keyframe interpolation (see article on this website).

The Catmull-Rom spline is useful for calculating a curve that passes through all control points. For example, it can be used to calculate the curve of an object from keyframes in an animation process.

The following animation illustrates an object that moves through \(5\) fixed points, following a curved trajectory created with the Catmull-Rom algorithm.

The instructions for calculating the points of the curve are the following:

```
//compute position on Catmull-Rom spline, relative to t
Vector3 ComputePositionCatmullRom(float t, Vector3 p0, Vector3 p1, Vector3 p2, Vector3 p3) {
Vector3 position = 0.5f * (2*p1 + (p2-p0)*t +
(2*p0-5*p1+4*p2-p3)*t*t +
(-p0+3*p1-3*p2+p3)*t*t*t);
return position;
}
```

In some situations, Bézier curves are used to define a path through which an object must move.

The following animation refers to a butterfly-shaped path, consisting of \(4\) paths connected at the junction points.

Each path is created using a distinct Bézier curve, with \(4\) control points. The fourth control point of each curve coincides with the first of the next. The Update function activates a coroutine to calculate the individual routes, using the Bézier formula:

```
private IEnumerator MotionOnPath(int number) {
// p0,p1,p2,p3 are control points
..............
while (t < 1) {
t += Time.deltaTime * velocityFactor;
pointComputed =
Mathf.Pow(1 - t, 3) * p0 +
3 * Mathf.Pow(1 - t, 2) * t * p1 +
3 * (1 - t) * Mathf.Pow(t, 2) * p2 +
Mathf.Pow(t, 3) * p3;
transform.position = pointComputed;
yield return new WaitForEndOfFrame();
}
..............
}
```

By modifying the number of component curves and changing the positions of the control points, it is possible to create an infinite number of different curves which can be adapted to various situations.

The graph of a curve \(p(t)=(x(t),y(t))\) can be thought of as the trajectory of a particle moving in the plane, with \(p(t)\) indicating the position at the time \(t\). The velocity of the particle is given by the following vector equation:

\[ v(t) = \frac{dp}{dt} \]We denote the modulus of velocity as \(\sigma(t)\):

\[ \sigma (t) = |v(t)| \]Generally the modulus of speed is not constant, but it varies according to the parameter. A fundamental parameter is the **curvilinear abscissa** \(s(t)\), which indicates the distance traveled in the time \(t\). The curvilinear abscissa is defined as follows:

where \(\left|\frac{dp}{dt}\right|= \sqrt{x'(t)^{2}+ y'(t)^{2}}\). If we use the parameter \(s\) we can see that the velocity module is constant and equal to \(1\).

**Example 7.1 – The circumference**

The circumference of a circle, with center \((0,0)\) and radius equal to \(1\), has the following parametric equations expressed with the curvilinear abscissa:

We can use a different parametrization, for example the following, in which the velocity is not constant:

\[ \begin{split} p(t) &=(\cos (t^{2}), \sin (t^{2})) \quad t \in [0,\sqrt{2\pi}] \\ v(s) &=(-2t\sin (t^{2}), 2t\cos (t^{2})) \\ \sigma (t) & = 2t \end{split} \]In various situations we need a constant velocity for an object along a curve (for example, in regard to the motion of a camera). If we have a curve of equation \(q(t)\) and we want the velocity to be constant, we must find the relation of the parameter \(t\) with the curvilinear abscissa \(t=f(s)\). This process is called **reparametrization**. Assuming that \(p(s)\) is a natural parametrization with the curvilinear abscissa, we must set \(p(s)=q(t)\) and solve with respect to \(t\). In the case of the above example of the circle the solution is easy: \(t=\sqrt{s}\). In the general case the following integral must be calculated, for each value of \(t\):

This integral equation, fixed a value of \(t\), allows to calculate the length \(s(t)\). The calculation of this integral generally requires the use of numerical analysis methods, among which one of the simplest is the Cavalieri-Simpson method (Wikipedia).

In reality, the inverse function is needed; given a value of \(s\) determine the time \(t\) in which the length takes on that value: \(t = g^{-1}(s)\). Except in the simplest cases, generally there is no solution in closed form, but one must use the methods of numerical analysis.

The most famous algorithm is Newton’s method for finding the zeros of a function (Wikipedia). The function in question is \(F(t)=g(t)-s\) where \(s\) is a fixed constant value. Newton’s method for finding the zeros of \(F(t)\) is based on the following iteration:

where \(F'(t)=|\frac{dq}{dt}|\).

So, in summary, given a Bézier curve \(p(t)\), if we want the object (Game Object) to move along the curve at a constant speed, we must take the following steps:

- determine the total length of the curve, using Simpson method or another method for calculating the approximate integral;
- divide the curve into sections of equal length, and for each value of the parameter \(s\) determine the corresponding value of the parameter \(t\), using Newton’s method;
- calculate \(p(t)\) with the Bézier formula.

For Simpson’s method and Newton’s algorithm see a numerical analysis text, for example Quarteroni’s book.

Splines are of great importance in game development. Developers need to have an understanding of the properties of various types of curves and choose the best for the problem. For completeness, in a future article we will describe two other types of curves: B-Splines and NURBS.

^{[1]}Seymour Lipschutz – Schaum’s Outline of Linear Algebra (McGraw-Hill)

^{[2]}A. Quarteroni, R. Sacco – Numerical Mathematics (Springer Verlag)

^{[3]}F. Scheid – Numerical Analysis (McGraw-Hill)

^{[4]}G. Lorentz – Bernstein Polynomials (Ams Chelsea Publishing)

^{[5]}E. Catmull, R. Rom – A Class of Local Interpolating Splines (from ‘Computer Aided Geometric Design’, by Academic Press, 1974)

The post Splines and Bézier Curves and their application in Video Games appeared first on GameLudere.

]]>The post Sprite Animation in Unity 3D and Finite State Machines appeared first on GameLudere.

]]>We can define the animation as a change over time of a certain property of an object, for example the position, the orientation, the state of motion, the color, the dimensions, etc. Time, movement and variation of some properties play a central role in any animation phenomenon.

The idea behind the animation is to display a series of images with a fairly high speed, so that the human brain interprets it not as a discrete flow, but as a continuous flow, thanks to the phenomenon of persistence of vision.

The **theory of persistence of vision**, already proposed by Lucretius (94 – 56 BC) in his work ‘De Rerum Natura’, states that the eye maintains the imprint of an image in the retina for a short time, even after the external stimulus has ceased.

The hypothesis of the persistence of vision has been questioned by other theories, in particular by the hypothesis of the **PHI phenomenon **proposed by Max Wertheimer in 1912. This theory states that our brain is able to connect empty time intervals in a sequence of static images, giving the illusion of movement. Basically, our brain would have an innate ability to perceive movement and in general to make sense of images even if they are not complete.

For a deeper study of the subject of vision, the following texts can be consulted: ^{[1]} or ^{[2]}.

Time plays a central role in animation and in motion simulation in general. Physical time is the real-world time and is a continuous variable. Computer simulations, on the other hand, manage only discrete data. The graphics pipeline renders images at discrete intervals; in each animation, the properties of the objects change in specific time instants. We can therefore distinguish two types of time:

- real-world time
- in-game time

The in-game time is made up of discrete intervals, managed by the game loop that periodically updates the status of the objects on the scene (for a description of the game loop see the article in this blog). In video games, the images displayed on the scene are prepared by frames that are updated with high frequency, for example \(30\) fps or \(60\) fps (frames per second).

While in the real world time is continuous and we can observe the various events at any particular moment, in video games and in computer simulations in general events occur in discrete moments within frames.

The preparation of each frame requires the updating of the data relative to each object on the scene:

- position
- orientation
- geometric shape
- scale factor
- appearance (color, material, lighting, etc.)
- physical state
- etc.

We have seen that frames are the places where any change to objects occur on the game scene, although obviously many frames can exist without changes compared to previous frames. Given the huge number of frames, it would be completely impractical to manually define the changes for each of them. Computer animation allows you to define some reference frames (**keyframes**), which define significant moments of the animation process, and the program automatically generates the animation sequence. A keyframe contains information about position, rotation, scale, etc. of the animated object. For example, in the case of the animation of an automatic bar that controls access to a parking lot, we can define two states, one with the bar lowered and another with the bar fully raised. These two states indicate the beginning and the end of the animation. The graphics engine (for example Unity 3D) employs a mathematical interpolation procedure, then prepares the intermediate frames (**tweens**), which together constitute a sequence that creates a smooth transition between keyframes.

There are different types of animations. Some can be created directly in Unity through the Unity Animation Editor product, while others are created through external products, such as Photoshop, Blender, Maya, 3ds Max, Gimp, and then imported into the Unity project. There is also the possibility of creating an animation through the code; in this case the animation is generally not created in advance, but is generated during the game, according to the events.

**Animation of a rigid body**

In this case an object, for example a door, is considered single and indivisible, with no parts that move independently. The changes relate only to position, orientation and possibly scale.

**Skeletal animation**

In this type of animation, the movement of the individual components must also be considered. This type of animation is the most used in games. Skeletal animation is an animation technique in which an articulated object (a character for example) is composed of two parts: a surface part (mesh or skin) used to represent the object and a hierarchical structure of interconnected **bones**, each with its properties. The animation system produces** poses** (or keyframes) for each bone of the structure and the animation engine connects the various keyframes through an interpolation process. The process of creating the link between the mesh and the hierarchy of bones is called** rigging**.

Skeletal animation is not limited to the human body, but can be applied to many situations: a car, a spaceship, a soldier, a door, etc.

**Sprite animation **

A sprite is a bitmap graphic image that is typically part of a larger scene. It can be static or it can be animated. A sprite sequence can be viewed through the frames at a suitable rate to give the sensation of continuous movement.

**Animation by morphing**

It is a technique for moving continuously from one image to another, transforming the vertices of the initial figure into the final one. In each keyframe of the animation the vertices are placed in different positions, approaching the vertices of the destination image. This animation mode allows greater control over the movement of objects.

**Physics based animation**

In some situations, physical engine-based animation can produce more realistic phenomena than keyframe-based animation. Unity’s physical engine allows you to efficiently simulate many real-world phenomena, such as the effects of gravity, the presence of wind or a stormy sea, etc.

For further information on the types of animation see ^{[3]}.

Unity has an animation system called **Mecanim**. To manage the animation process, Mecanim uses the finite state machine tool: each state corresponds to an animation and it is possible to define the transitions between the various states. For each transition from one state to another Mecanim is able to perform interpolations between the frames, in order to obtain a fluid and natural sequence. The system also allows you to manage the overlapping of different animations, necessary for complex structures.

Three components are needed to create a GameObject animation:

- an Animator component for the GameObject
- an Animator Controller
- at least one Animation Clip

Let’s see, for example, how to create the animation of a single object, changing some of the properties of the object itself: position, rotation, scale or color.

The steps to create a single animation clip are as follows:

- create a new scene;
- create an object (e.g. a 3D cube);
- in the Window menu choose the
*Animation*option; - click on the
*Create*button and give a name to the animation (file with extension .anim); - click on the
*Add Property*button. Here you can choose the type of animation specifying the property that must change: position, rotation, scale or the properties of Mesh Renderer (e.g. the color).

When the Create button is pressed, Unity performs several actions. First, it creates an Animator Controller for the object (file with extension .controller). Then, it adds an Animation on this controller. After this, it adds an Animator component to the object, visible in the Inspector.

Of course, multiple animations can be defined on the same object. In the following example we can see \(3\) types of animations, with three properties that vary: position, orientation and color.

For more details see the Unity online documentation.

The Animation Window supports two different operating modes to create and edit the animation clip:

- Dopesheet mode
- Curves mode

In Dopesheet mode it is possible to precisely set the keyframes and the values of the variable properties (position, rotation, color, etc.). The Dopesheet offers a compact view of the temporal moments in which the properties change their value. However, it is difficult to have a precise idea of the values of the various properties in the time intervals between the keyframes.

The Curves mode allows you to have more control over the values of the properties at any time. The **Curve Editor** allows you to create and edit curves. An animation curve has a plurality of keys, which are animation control points. The animation curves have different colors to represent the values of the various object properties of the animation.

In version 4.3 Unity introduced the possibility to choose the 2D option and added the **Sprite object**, which contain a bitmap image (Texture2D). A 2D sprite is a graphic image that can be used as a two-dimensional object with coordinates (x, y). A sprite in Unity is defined by a rectangle, a Texture2D and a pivot point.

The sprite can represent a single object or an entire scene. In addition, several sprites can be combined to create a single object.

Sprites can be created directly through Unity or imported into the project assets, and they can be provided with movement. If you open a Unity project in 2D mode, each image imported into the project is assigned the Sprite texture type (2D and UI).

When a Sprite GameObject is created, Unity also creates the associated Sprite Renderer component, which is responsible for rendering the sprite itself. While in a 3D environment the appearance of an object differs depending on the lighting and the position of the camera, in a 2D space the object is represented without any depth. With the Sprite Renderer it is possible to set various properties such as material, color, layer, etc.

It is often more convenient to record a collection of sprites in a single image. The idea is to create a single image that contains all the animations of an object. This image is called a **sprite sheet** (or sprite atlas). Unity provides a tool (the Sprite Packer) to combine individual sprites into a single atlas.

An alternative is to draw sprites and create animations with one of the many products on the market (Photoshop, Blender, Gimp, Maya, 3ds Max). In this way you can test and try the animation, simplifying the complexity of the project. Then we proceed to import into the assets of the Unity project by setting the Sprite Mode in the inspector to Multiple. Then, using the Unity Sprite Editor , the individual images are separated in three different ways:

- Automatic
- Grid by cell size
- Grid by cell count

An easy way to create an animation from a sprite sheet is following these steps:

- import the collection of sprites, for example in the .png format;
- split the atlas into the individual sprites using the Sprite Editor; at this point, in the Project View an arrow is shown on the imported sprite, to indicate that it is a collection of sprites;
- select the sprites that are part of a complete animation sequence and drag and drop them on the Hierarchy Window. Unity will open a dialog to create and save the animation on a file with the .anim extension. A sprite object is created on the scene. At the same time, Unity also creates the Animator controller and adds the Animator component to the object. Running the game will play all the images in the sequence;
- if necessary, modify the Animation controller parameters to adjust speed, animation looping, keyframes, etc.;
- to move the object prepare a script associated with it.

The following image presents a simple animation of a sprite sheet. An associated script controls the horizontal movement and the collision with the side walls.

Of course, you can make more complex animations. A classic example is to have various collections of sprites related to different states of a character: idle, walk, run, attack, die. A different animation is created for each of the collections. Each animation corresponds to a different state and these animations can be coordinated by an Animator Controller, which is nothing more than a finite state machine, as we will see later.

For further information on animation with Unity see ^{[4]} or ^{[5]}.

As we have seen, finite state machines are part of the Unity animation system. However, they can also be implemented without animation.

The Animator of the Mecanim animation system is in fact a finite state machine, capable of managing different states and passing from one state to another through transitions. The steps to control the behavior of an object using a simple finite state machine based on the Unity Animator are as follows:

- create a GameObject;
- create an Animator Controller;
- define states, transitions and parameters;
- create Behaviour scripts associated with states;
- in the script associated with the GameObject, include the control of the parameters to be communicated to the Animator (e.g. controlling the colors of the traffic light, the distance of a player, etc.).

The following image is a simple example of animation obtained using the Animator as a finite state machine. An object, in this case a ball, can be in three different states: stationary, low speed, high speed, depending on the color of the traffic light. Three transitions are defined for switching between states when there is a change in the traffic light. Despite its simplicity, the example contains the essential ingredients of each finite state machine: an object that can be in different states and passes from one state to another according to changes in the external environment.

Of course, other more interesting objects can be used instead of the ball: a character who regulates his state of motion according to the traffic light, a car, a mechanical device, etc.

Animation is a complex and fundamental topic for the development of interesting and sophisticated video games, which are capable of effectively simulating the real world. In subsequent articles we will explore the various types of animation and will describe the mathematical tools that underlie the sophisticated algorithms used in modern video games.

^{[1]}V. Bruce, M. Georgeson, P. Green – Visual Perception: Physiology, Psychology and Ecology (Psychology Press)

^{[2]}D. Hubel – Eye, Brain, and Vision (Freeman & Co)

^{[3]}Jason Gregory – Game Engine Architecture (CRC Press, 2014)

^{[4]}A. Godbold, S. Jackson – Mastering Unity 2D Game Development (Packt Publishing)

^{[5]}A. Thorn – Unity Animation Essentials (Packt Publishing)

The post Sprite Animation in Unity 3D and Finite State Machines appeared first on GameLudere.

]]>The post Ordinary Generating Functions and Recurrence Equations appeared first on GameLudere.

]]>Generating functions are an important tool for solving combinatorial problems of various types. A typical problem is the counting of the number of objects as a function of the size \(n \), which we can denote by \(a_{n} \). Thus, for each value of a non-negative integer \(n\) we have a sequence of values \(\{a_{0}, a_{1}, \cdots a_{n}, \cdots \} \). We call **generating function **of the sequence \(a_{n} \) the following expansion of powers:

If there is an infinite number of terms it is a series of powers; in the finite case it is a polynomial. Two generating functions

\[ \begin{array}{l} F(x) = \sum\limits_{n = 0}^{\infty} a_{n} x^{n} \\ G(x) = \sum\limits_{n = 0}^{\infty} b_{n} x^{n} \\ \end{array} \]are equivalent if \(a_{n} = b_{n} \) for each value of \(n \).

In the study of generating functions the aspects relative to the convergence of the series are generally overlooked. In this case the generating function is also called **formal series**. In this sense, the generating function is simply a way of representing the sequences of numbers, and the powers of \(x \) indicate the place associated with the various terms of the sequence.

Formal series are a useful tool to represent sequences of numbers even if they have many limitations. If you want to use the closed forms of the generating functions (for example the function \(\dfrac {1} {1-x} \) of the geometric series), then it is essential to study the convergence interval of the series.

Let us see with some examples how the generating functions can help in counting problems.

**Example 1.1 – Newton’s binomial theorem**

Suppose we want to calculate the number of subsets of \(k \) objects, taken from a set of \(n \) elements. It is known that the value is given by the binomial coefficient \(\displaystyle \binom{n} {k} \). So the generating function of these coefficients is Newton’s polynomial:

**Example 1.2**

Given the sequence \(\{1,1,1,1, \cdots \} \), the generating function is the geometric series:

**Exercise 1.1**

Find the sequence-generating function \(\{1,2,4,8, \cdots \} \).

**Exercise 1.2**

Find the sequence generated by the following function:

Hint

Use the Maclaurin serie*s* expansions of the function.

The numbers of the sequence generated by the function are the Catalan numbers \(C_{n} = \frac {1}{n} \binom{2n-2}{n-1} \). For more details on Catalan numbers see the following link.

**Exercise 1.3**

Find the function that generates the sequence of Fibonacci numbers, which are defined as follows:

with initial values \(F_{0} = 0 \quad F_{1} = 1 \).

Hint

Combine the series \(G (x) \), \(xG (x) \) and \(x^{2}G (X) \) together. Thus, the generating function for the Fibonacci numbers can be expressed as:

**Sum and product of generating functions**

Given two generating functions \(F (x) = \sum\limits_{n = 0}^{\infty} a_{n} x^{n} \) and \(G (x) = \sum\limits_{n = 0}^{\infty}b_{n} x^{n} \), the following sum and product operations are defined:

The two-series product is also called Cauchy’s product, and the expression \(c_{n} \) plays an important role in counting indistinguishable objects.

Two generating functions \(F (x), G (x) \) are **reciprocal** if it results that \(F (x) G (x) = 1\). The **inverse **of \(F (x)\) exists if and only if \(a_ {0} \neq 0 \).

**Example 2.1**

The inverse generating function of \(F (x) = \sum\limits_{n = 0}^{\infty} x^{n}\) is the function \(G (x) = 1-x \) since

**Change of scale**

Multiplying a generating function by a constant is equivalent to scaling each term of the sequence by the same value.

For example, the generating function \(\dfrac{1}{1-x^{3}} \) generates the sequence \((1,0,0,1,0,0,1, \cdots) \). If we multiply by \(5 \), the new function \(\dfrac {5}{1-x^{3}} \) generates the sequence \((5,0,0,5,0,0, \cdots ) \).

**Shift to the right of the succession**

Let \(\{a_{0}, a_{1}, a_{2}, \cdots \} \) be a sequence and let \(G (x) = \sum\limits_{n = 0}^{ \infty} a_{n} x^{n} \) be its generating function. If we change the sequence by adding \(k \) zeros to the left \(\{0,0, \cdots, 0, a_{0}, a_{1}, a_{2}, \cdots \} \), then the new generating function is \(x^{k} G (x) = a_{0} x^{k} + a_{1} x^{k + 1} + \cdots \).

**Example 2.2**

The generating function of the sequence \(\{0,0,0,1,1,1, \cdots \}\) is the function \(G (x) = \dfrac {x^{3}} {1-x } \).

**Derivative operation**

Let \(\{a_{0}, a_{1}, a_{2}, \cdots \} \) be a sequence with the generating function \(G (x) = \sum\limits_{n = 0}^{\infty} a_{n} x^{n} \). The derivative of the function \(G (x) \) is the generating function of the sequence \(\{a_{1}, 2a_{2}, 3a_{3}, \cdots \} \).

**Example 2.3**

The derivative of the function \(\dfrac{1}{1-x} \) is the function \(\dfrac {1}{(1-x)^{2}} \), which generates the sequence of natural numbers \(\{1,2,3,4, \cdots \} \).

A fundamental problem is to determine the values of a sequence for each \(n \), starting from the generating function expressed in closed form as a function of \(x \).

**Exercise 3.1**

Find the coefficient \(a_{n} \) of the following generating functions, expressed in closed form:

**Exercise 3.2**

Find the coefficients \(a_{n}\) of the following generating function, expressed in closed form:

Solution

Multiplying the \(r \) Taylor representations of the function \(\dfrac{1}{1-x}\), we see that the coefficient in front of \(x^{n}\) is the number of ways of writing \(n = p_{1} + p_{2} + \cdots + p_{r} \text { where } p_{i} \ge 0 \). This is equivalent to the problem of finding the number of combinations with repetitions of \(r \) objects taken in groups of \(n \). As is known, this value is

**Definition 3.1**

Given a sequence of numbers \(a_{0}, a_{1}, \cdots \), we define the **partial product of order** \(n \): \(\prod_{i = 0}^{n} a_{i } = a_{0}a_{1} \cdots a_{n} \). Therefore we define the **infinite product** as the limit of the partial product (in analogy to the definition of the sum of an infinite series):

The study of infinite products, like that of the series, is very important in Mathematical Analysis. For further information see, for example, Knopp’s beautiful book^{[1]}.

Let \(p (n) \) be the arithmetic function that counts the number of partitions of the positive integer \(n \) in not necessarily distinct parts. Order is not taken into account. For example if \(n = 4 \), \(p (4) = 5 \). The partitions are as follows:

\[ 4=1+1+1+1=1+1+2=1+3=2+2 \]**Exercise 3.3**

Determine the generating function of the function \(p (n) \), which represents the number of partitions of a positive integer \(n \) in parts that are not necessarily distinct.

Solution

Let’s analyze the following identity:

We observe first that, although there is an infinite product, to calculate of the coefficient \(a (n) \) of the power \(x^{n} \) it is necessary to analyze only a finite number of terms, precisely those that have a power less than or equal to \(n \).

In our case the coefficient of the power \(x ^ {n} \) is precisely the function \(p (n) \), as the term we choose in each factor \((1 + x^{k} + x^{2k} + x^{3k} + \cdots) \) determines how many times the number \(k \) appears in the partition. So we have:

**Exercise 3.4**

Determine the function generating the function \(p^{d} (n) \) which represents the number of partitions of a positive integer \(n \) in distinct parts.

Solution: \(\prod\limits_{n = 1} ^ {\infty} (1 + x^{n}) \)

**Exercise 3.5**

Prove the following identity:

Hint

Multiplying two factors at a time in the following infinite products, we obtain:

Continuing in this way we obtain the following final identity:

\[ \prod_{n=1}^{\infty} (1+x^{n}) \prod_{n=1}^{\infty} (1-x^{2n-1})=1 \]**Exercise 3.6**

Prove the following relationship:

Hint

Note that \((1 + x)^{2n} = \sum\limits_{k = 0}^{2n} \binom{2n} {k} x^{k} \). Furthermore:

Then conclude by remembering that \(\binom{n}{k} = \binom{n}{n-k}\).

A very important class of generating functions is represented by polynomials obtained by multiplying polynomial factors with coefficients equal to \(1 \). For example the polynomial

\[ p(x) = (1 + x + x^{2})^{6} \]The coefficient of \(x^{k} \) corresponds to the number of integer solutions of the equation \(a_{1} + a_{2} + a_{3} + a_{4} + a_{5} + a _{ 6} = k \), with \(0 \le a_{i} \le 2 \). The solution to this equation is equivalent to the problem of choosing \(k \) objects from a collection of \(6 \) types, with two objects for each type. Or it is equivalent to the problem of distributing \(k \) identical objects in \(6\) separate boxes, with a maximum of two objects per box.

Vice versa, given a combinatorial problem of choice with repetition of identical objects, we can determine the corresponding generating function.

**Example 4.1**

Let us consider 3 types of balls of different colors: red, black, green. Calculate, through the generating function, the number of possible ways to select \(6\) objects with repetition, with a maximum of \(4\) objects of each type.

Solution

In terms of diophantine equation, we have to find integer solutions of the equation \(a_{1} + a_{2} + a_{3} = 6, \quad 0 \le a_{i} \le 4 \). It is easily found that the solution is the coefficient of \(x^{6} \) of the generating function \(G (x) = (1 + x + x^{2} + x^{3} + x^{4 })^{3} \).

Many problems of combinatorial nature are reduced to finding the solution of a recurrence equation, with appropriate initial conditions. Basically, a recurrence equation breaks down a problem of order \(n \) into one or more similar problems of a lower order. Many recursive divide and conquer algorithms, such as merge-sort, have a temporal complexity that can be modeled with recurrence equations.

We limit our study to linear equations. For an in-depth study of recursive algorithms see ^{[2] or [3]. }

**Definition 5.1**

A** linear recurrence equation** of degree \(k \) with constant coefficients is a relationship of the type

where the coefficients \(c_{i}\) are constant. It is said **homogeneous** if \(f (n) = 0\), otherwise it is said **not homogeneous**.

**Example 5.1**

The relationship \(y_{n} = n y_{n-1}\), with \(y_{1} = 1 \), defines the factorial function of \(n \).

A recurrence equation is also called a **finite difference equation**. There is a close analogy with ordinary linear differential equations. To solve a linear non-homogeneous recurrence equation, first we find the general solution of the homogeneous equation, then we add a particular solution of the non-homogeneous equation.

To solve the homogeneous equation, we must first find the solutions of the characteristic equation.

**Definition 5.2**

The **characteristic equation** of the recurrence equation of degree \(k \) defined above is the following algebraic equation:

In the case of ordinary linear differential equations the exponential functions \(e ^ {\lambda x} \) are taken as the basis for the roots. In recurrence equations we use the functions \(y_ {n} = r^{n} \). The algebraic solutions of the characteristic equation are called **characteristic roots**. Three cases need to be distinguished:

- all roots are real and distinct
- all real roots but not distinct
- some roots are complex numbers

In the first case the general solution of the homogeneous equation is the following:

\[ y_{n} = a_{1} r_{1}^{n}+ a_{2} r_{2}^{n}+ \cdots a_{k} r_{k}^{n} \]where \(a_{1}, a_{2}, \cdots a_{k} \) are arbitrary constants.

In the second case, for example if a root is double \(r_{1} = r_{2} \), we have the solution \((a_{1} + a_{2} n) r_{1}^{n} \). We have similar formulas if the multiplicity of the root is greater than \(2\).

In the third case, for every complex solution \(\alpha + i \beta\) there is also the complex conjugate \(\alpha – i \beta \). So we can write the solution, for each complex root, like this: \( A (\alpha + i \beta)^{n} + B (\alpha – i\beta)^{n} \).

For a deeper study of the methods for solving finite difference equations see ^{[4]}.

**Exercise 5.1**

Find the general solution of the recurrence equation:

**Exercise 5.2**

Find the general solution of the recurrence equation:

Answer: [\((A + Bn + Cn^{2}) (-1)^{n} \)]

**Exercise 5.3**

Find the solution of the recurrence equation:

**Exercise 5.4**

Find the solution of the following recurrence equation:

Answer: [\(y_{n} = 2 \cdot 3^{n} \)]

To solve the non-homogeneous equation, we just have to find a particular solution of the overall equation and add it to the general solution of the homogeneous equation. Particular solutions can generally be found through methods that vary depending on the format of the non-homogeneous term \(f (n) \).

**Exercise 5.5**

Solve the equation \(y_{n + 2} -4y_{n + 1} + 4y_{n} = n \).

Solution

The characteristic equation \(r^{2} -4r + 4 = 0 \) has a double solution \(r = 2 \). So the general solution of the homogeneous equation is \(y_{n} = a 2^{n} + bn 2^{n} \), where \(a, b \) are two arbitrary constants to be determined by means of two initial conditions.

To find a particular solution of the non-homogeneous equation, we must look at the form of the term to the right of the equation. In this case, we try with a polynomial of degree equal to the maximum degree of the term on the right. So let’s try with \(y_{n} = An + B \). Substituting this expression in the equation and equating the coefficients of the various powers of \(n \) we obtain the values \(A = 1, B = 2 \). So the general solution of the non-homogeneous equation is the following:

**Exercise 5.6**

Solve the equation \(y_{n + 2} -4y_{n + 1} + 4y_{n} = n^{2} \).

In this case, find the particular solution of the non-homogeneous equation starting from the polynomial \(y_ {n} = An ^{2} + Bn + C \).

Solution: [\(y_{n} = a2^{n} + bn2^{n} + n^{2} + 4n + 8 \)]

**Exercise 5.7 – Perrin’s equation**

Solve the equation \(y_{n} = y_{n-2} + y_ {n-3} \) for \(n \ge 3 \), with the initial conditions \(a_{0} = 3, a_{1} = 0, a_{2} = 2 \).

Solution

We solve by the method of the generating function. Let \(G (x) = \sum\limits_{n = 0}^{\infty} a_{n} x^{n} \). We first calculate the expressions \(xG (x), x^{2} G (x), x^ {3} G (x) \). So with some calculations we get the following expression for the generating function:

The algebraic equation in the denominator has three roots, one real \(r_{1} \) and two complex conjugate \(r_{2}, r_{3} \). The inverse of the real root is also called **plastic number**, whose approximate value is 1.32471 (see link).

We now use the method of decomposition into partial fractions:

By carrying out the calculations we find the following values for the three constants: \(A = -r_{1}, B = -r_{2},C = -r_{3} \). So putting together the series development of the three functions we obtain the following formula for the coefficient \(a (n) \) of the generating function:

\[ a(n)= \left(\frac{1}{r_{1}}\right)^{n}+ \left(\frac{1}{r_{2}}\right)^{n}+ \left(\frac {1}{r_{3}}\right)^{n} \]Fibonacci **(1170-1235**) numbers are non-negative integers defined by the following recurrence equation:

The first Fibonacci numbers are \(\{0,1,1,2,3,5,8,13, \cdots \} \).

**Exercise 6.1**

Prove the following relationships:

**First method for solving the recurrence equation**

We solve the recurrence equation satisfied by Fibonacci numbers using the characteristic function. Let \(F_{k} = r ^{k} \). In this case, the characteristic equation and its solutions are the following:

The general solution of the homogeneous equation is therefore:

\[ F_{n} = A\left(\frac{1 + \sqrt{5}}{2}\right)^{n} + B \left(\frac{1 – \sqrt{5}}{2}\right)^{n} \]We note that the general solution contains two arbitrary constants. Taking into account the initial conditions \(F_{0} = 0, F_{1} = 1 \), we finally have the solution that gives the formula for Fibonacci numbers:

\[ F_{n} = \frac{1}{\sqrt{5}} \left(\frac{1 + \sqrt{5}}{2}\right)^{n} – \frac{1}{\sqrt{5}}\left(\frac{1 – \sqrt{5}}{2}\right)^{n} \]**Second method of solving the recurrence equation**

We solve the recurrence equation satisfied by the Fibonacci numbers with the method of generating functions, which has been found previously:

Theoretically we could develop this function with Taylor’s development at the point \(x = 0 \) and find the coefficients of the powers of \(x \), which are the Fibonacci numbers. However, this calculation would be too complicated; in this case, since the function is a ratio between two polynomials (a rational function) it is better to carry out the decomposition by means of the** partial fraction method. **

We observe first that \(x^{2} -x-1 = (1- r_{1} x) (1-r_{2} x) \), where \(r_{1}, r_{2} \) are the two roots previously found. Thus, the following decomposition can be written:

Finding the least common multiple of the two fractions on the right, and equaling with the expression on the left, we find the following values for the constants: \(A = \dfrac{1} {\sqrt {5}}, B = – \dfrac {1} {\sqrt {5}} \). Recalling now the geometric series expansion

\[ \frac{1}{1-x} = 1 + x + x^{2} + x^{3} + \cdots \]and putting the results together, we obtain again the general expression for the coefficients \(F_ {n} \) already found before.

Two quantities \(a, b \) are said to be in a golden ratio if the following proportion results:

\[ (a+b):a= a:b \]If we denote \(x = \dfrac{a} {b}\) we have the equation \(x^{2} -x-1 = 0 \), whose roots are, as we have seen previously, \(r_{1 }, r_{2} \). The symbol \(\Phi \) is used to denote the value

\[ \Phi = \dfrac {1+ \sqrt {5}} {2} = 1.1618033 \cdots \]which is an irrational number. In geometry, we say that a segment is divided into two parts according to the golden section if their ratio has value \(\Phi \).

**Exercise 6.2**

Prove the following formula:

Hint

Write \(\dfrac{F_{n + 1}} {F_{n}} = \dfrac{F_{n} + F_{n-1}} {F_{n}} \) and then compute the limit as \(n\) goes to infinity.

**Lucas **(1842-1891) numbers are defined similarly to Fibonacci numbers:

but with different initial conditions:

\[ L_{0}=2, L_{1}=1 \]The first Lucas numbers are \(\ {2,1,3,4,7,11,18, \cdots } \). The recurrence equation is the same as the Fibonacci numbers, so the general solution is the same. However, the particular solution, obtained by imposing the two initial conditions, is obviously different:

\[ L_{n} = \left(\frac{1 + \sqrt{5}}{2}\right)^{n} + \left(\frac{1 – \sqrt{5}}{2}\right)^{n} \]**Exercise 6.3**

Prove the following relationship: \(F_{2n} = L_{n}F_ {n} \).

As we have seen in this article, ordinary generating functions are a useful tool for solving many combinatorial problems with repetitions. In a future article we will describe exponential generating functions, which are useful for solving problems of distribution of distinct objects.

^{[1]}K. Knopp – Theory and Application of Infinite Series (Dover)

^{[2]}T. Cormen – Introduction to Algorithms (The Mit Press)

^{[3]}J. Edmonds – How to Think about Algorithms (Cambridge UP)

^{[4]}M. Spiegel – Finite Differences and Difference Equations (McGraw-Hill)

The post Ordinary Generating Functions and Recurrence Equations appeared first on GameLudere.

]]>The post Lambert Series, the Arithmetic Function \(r(n)\) and Gauss’s Probability Integral appeared first on GameLudere.

]]>Let’s briefly recall some properties of Dirichlet generating functions.

**Definition 1.1**

Given an arithmetic function \(f(n\)), the associated **Dirichlet generating function **is the following series:

where \(s=\sigma + it\) is a complex number.

Regarding the problem of convergence, recall that for each Dirichlet series there is a real number \(\sigma_{0}\), called **abscissa of absolute convergence,** such that the series converges absolutely in the half-plane \(\sigma \gt \sigma_{0}\), to the right of \(\sigma_{0}\).

It can be shown that in the region of absolute convergence the Dirichlet series represents the arithmetic function \(f(n)\) uniquely; that is, if two arithmetic functions are different their corresponding Dirichlet series are also different.

**Example 1.1**

The most famous and important Dirichlet series is the **Riemann zeta function**:

The abscissa of absolute convergence of \(\zeta(s)\) is equal to \(1\).

The inverse of the zeta function has the following Dirichlet generating function:

\[ \frac{1}{\zeta(s)}=\sum\limits_{n=1}^{\infty}\frac{\mu(n)}{n^{s}} \]where \(\mu(n)\) is the function of Mobius. Recall that the **Möbius function** \(\mu(n)\) is defined as follows:

For the Möbius function see the article in this blog.

**Example 1.2**

We define the following arithmetic function (also called **Dirichlet character**):

The Dirichlet character therefore assumes the value \(+1\) for all odd numbers of the form \(4k+1\), and the value \(-1\) for all odd numbers of the form \(4k+3\).

The associated Dirichlet series is:

\[ L(s)=\sum\limits_{n=1}^{\infty}\frac{\chi(n)}{n^{s}}= 1^{-s}-3^{-s}+5^{-s}- \cdots \]**Exercise 1.1**

Prove that the function \(\chi(n)\) is completely multiplicative. That is, for every pair of positive integers \(n,m\) we have:

**Definition 1.2**

Given two arithmetic functions \(f(n),g(n)\), we define the **product** (or **convolution**) **of Dirichlet** the following arithmetic function:

The following theorem holds:

**Theorem 1.1**

Let two arithmetic functions \(f(n),g(n)\) be given, with respective Dirichlet generating functions \(F(s),G(s)\). Then, in the half-plane in which both series converge absolutely, the generating function \(H(s)\) of the convolution \(f*g\) is the following:

Proof

We write the product of the two series:

Thanks to absolute convergence we can multiply and reorder the two series as we want, without changing the result. To conclude the proof, we simply group the terms with the constant product \(nm=k\) and write the series relative to the product.

From the previous theorem the following theorem easily derives:

**Theorem 1.2**

Suppose that

and \(F(s),G(s)\) are the Dirichlet generating functions of \(f(n),g(n)\). Then

\[ \begin{array}{l} f(n) = \sum\limits_{d|n}g(d)\mu \left(\dfrac{n}{d}\right) \\ G(s) = \zeta(s) F(s) \end{array} \]For an in-depth study of the Dirichlet series consult a book about number theory, for example ^{[1]} or ^{[3]} .

**Theorem 1.3**

where \(d(n)\) is the arithmetic function that counts the number of divisors of \(n\) (see the article in this blog).

Hint

Use theorem 1.2.

Despite the importance of Dirichlet functions, there are other possible generating functions. Among these are the **Lambert** series (1728-1777), which have the following form:

If the series \(\sum\limits_{1}^{\infty}a_{n}\) converges, then the Lambert series converges for all values of \(x\), except for \(x \pm 1\). Otherwise it converges for the values of \(x\) for which the power series \(\sum\limits_{n=1}^{\infty}a_{n}x^{n}\) converges.

In the case \(a_{n}=1\), if \( |x| \lt 1\) the fraction \(\dfrac{x^{n}}{1-x^{n}}\) is the sum of a geometric series and thanks to absolute convergence the series can be reordered without changing the sum.

**Theorem 2.1**

If the Lambert series converges absolutely we have:

where

\[ b_{n}= \sum\limits_{d|n} a_{d} \]Proof

The absolute convergence allows to reorder the terms of the series, grouping all the terms that give a constant value \(mn=k\), relative to the power \(x^{k}\). For example, to compute the coefficient of \(x^{6}\), we can add the coefficients with all the divisors of \(6\): \(a_{1},a_{2},a_{3},a_{6}\). So we have

where

\[ b_{n}= \sum\limits_{d|n} a_{d} \]The following theorem holds:

**Theorem 2.2**

Suppose that \(A(s),B(s)\) are the Dirichlet series relative to the coefficients \(a_{n},b_{n}\). Then:

if and only if

\[ \zeta(s)A(s)=B(s) \]**Exercise 2.1**

**Exercise 2.2**

Hint

Use theorem 1.3 and theorem 2.1.

**Exercise 2.3 **

Prove that if

then

\[ \begin{array}{l} F(x) = \sum\limits_{n=1}^{\infty}G(x^{n}) \\ \end{array} \]We define the arithmetic function \(r(n)\) as the number of representations of \(n\) in the form

\[ n=x^{2}+y^{2} \quad x,y \in \mathbb{Z} \]The representations that differ in signs or order are counted as distinct. For example \(r(5)=8\) because

\[ 5=(\pm 1^{2})+(\pm 2^{2})=(\pm 2^{2})+(\pm 1^{2}) \]For number \(4\) we have

\[ 4=(\pm 0^{2})+(\pm 2^{2})=(\pm 2^{2})+(\pm 0^{2}) \]In the case of a composite number, the conditions for representing an integer as the sum of two squares are contained in the following Fermat theorem:

**Theorem 3.1 – Fermat**Let a positive integer \(n\) with the following prime factorization:

where \(p_{i} \equiv 1 \pmod{4}\) and \(q_{j} \equiv 3 \pmod{4}\). Then it is possible to write \(n=x^{2}+ y^{2}\) if and only if all the exponents \(s_{j}\) are even.

For a proof see one of the texts in the bibliography.

The following theorem has been proved by Jacobi through his theory of elliptic functions. However, an equivalent form had also been proved by Gauss in his fundamental work **‘Disquisitiones Arithmeticae’**, published in 1801 in Latin.

**Theorem 3.2 – Jacobi-Gauss**

where \(d_{1}(n),d_{3}(n)\) are the number of divisors of the form \(4n+1\) and \(4n+3\) respectively. For a proof see ^{[2]}.

**Example 3.1**

In the case \(n=10\) we have \(r(10)=8\). The divisors of type \(4k+1\) are \(1,5\). There are no divisors of type \(4k+3\). Then

We note that, from the definition of the Dirichlet character, it results:

\[ d_{1}(n)-d_{3}(n)=\sum\limits_{d|n}\chi(d) \]We can now go back to our function \(r(n)\). Based on theorem 3.2 we can write

\[ r(n)=4 \sum\limits_{d|n}\chi(d) \]For theorem 1.2 we have then

\[ \sum\limits_{n=1}^{\infty}\frac{r(n)}{n^{s}}=4\zeta(s)\sum\limits_{n=1}^{\infty}\frac{\chi(n)}{n^{s}} \]From the previous relation and from theorem 2.1 we easily prove the following theorem:

**Theorem 4.1**

Now consider the following series

\[ S= 1+2x+2x^{4}+2x^{9}+\cdots \]If we calculate the square \(S^{2}\) of the series, we obtain a series of powers in which the coefficient in front of \(x^{n}\) is precisely \(r(n)\). In fact, every pair of exponents \( i,k\) such that \(i^{2}+k^{2}=n\) contributes to the coefficient with a unit value. From this observation we can write the previous theorem in this equivalent form:

**Theorem 4.2**

In the following paragraphs we will use this formula to calculate Gauss’s integral.

The Gaussian integral is the integral of the function \(f(x)=e^{-x^{2}}\) calculated on the whole real axis:

\[ \begin{array}{l} \int\limits_{- \infty}^{\infty}e^{-x^{2}} dx = \sqrt{\pi} \end{array} \]The function \(e^{-x^{2}}\) is symmetric with respect to the origin, therefore it is sufficient to calculate the integral only on the positive real half-line:

\[ I= \int\limits_{0}^{\infty}e^{-x^{2}} dx = \frac{\sqrt{\pi}}{2} \\ \]Even if it is connected to the name of Gauss, the integral was already known to **De Moivre** (1667-1754) in his studies on the calculus of probability, in particular in his book **‘The Doctrine of Chances’** of 1718. It was also subject of study by **Laplace** (1749-1827).

The Gaussian integral cannot be calculated with elementary methods. One of the simplest methods is probably that of **Poisson** (1781-1840), which consists in calculating the double integral

using polar coordinates, and take advantage of the fact that the double integral can be expressed as the product of the two partial integrals. For details see a text of mathematical analysis.

Assuming we know the value of the Gaussian integral, we can prove the following theorem:

**Theorem 5.1**

Hint

Use the formula for the approximation of the integral of the function \(f(x)=e^{-x^{2}}\) through the Riemann sums:

Knowing the value of the Gaussian integral, we can easily deduce that \(\alpha= I=\dfrac{\sqrt{\pi}}{2}\).

Suppose now that we do not know the value of the Gaussian integral. If we are able to calculate the limit of the previous theorem in an alternative way, we could calculate the Gaussian integral. Based on theorem 4.2 we can write:

\[ \begin{array}{l} \alpha ^{2}= \lim\limits_{x \to 1-0} \left(1-x \right) \left(1+x +x^{4}+x^{9}+ \cdots + x^{n^{2}}+ \cdots \right)^{2} \\ = \dfrac{1}{4}\lim\limits_{x \to 1-0} \left(1-x\right)\left(2+ 2x +2x^{4}+2x^{9}+ \cdots + 2x^{n^{2}}+ \cdots\right)^{2} \\ = \dfrac{1}{4} \lim\limits_{x \to 1-0}\left(1-x \right) \left(1+4\left(\dfrac{x}{1-x}-\dfrac{x^{3}}{1-x^{3}}+\dfrac{x^{5}}{1-x^{5}}- \cdots \right)\right) \\ = \lim\limits_{x \to 1-0}\left(1-x \right) \left(\dfrac{x}{1-x}-\dfrac{x^{3}}{1-x^{3}}+\dfrac{x^{5}}{1-x^{5}}- \cdots \right) \\ \end{array} \]The transition from the second to the third expression is easily justifiable. Let us now remember the famous **formula of Leibniz** (1646-1716):

Using the Leibniz formula we can complete the calculation of the limit, obtaining the following result:

\[ \alpha ^{2}= 1 -\frac{1}{3}+\frac{1}{5}- \frac{1}{7}+ \cdots = \frac{\pi}{4} \]So we can conclude that the value of the Gaussian integral is

\[ I = \alpha = \frac{\sqrt{\pi}}{2} \]We have seen in this article an example of how a result that strictly belongs to the Theory of Numbers can be used to prove theorems and properties of Mathematical Analysis. This confirms once again the profound unity of Mathematics. In the next articles we will study other properties of the Dirichlet series and the Lambert series.

^{[1]}Niven, Zuckerman, Montgomery – An introduction to the Theory of Numbers (V edition, Wiley, 1991)

^{[2]}W. LeVeque – Fundamentals of Number Theory (Dover)

^{[3]}Hardy, Wright – An Introduction to the Theory of Numbers (Oxford University Press)

The post Lambert Series, the Arithmetic Function \(r(n)\) and Gauss’s Probability Integral appeared first on GameLudere.

]]>The post Motion in a Plane and Unity’s 2D Physics Engine appeared first on GameLudere.

]]>For the study of physics it’s essential to know the units of measurement of fundamental and derived quantities, as defined in the International System (SI). The basic quantities of mechanics are shown in the following table:

For an exhaustive review of the units of measurement, see this link to Wikipedia.

Kinematics is the branch of mechanics that studies the motion of bodies without considering the action of the forces involved. The main physical quantities under study are: position, speed and acceleration. The simplest case to study is that of a point-like body with negligible dimensions for the context in which it moves. We will use the standard term of **material point:** practically a geometric point, to which a mass \(m \) can be associated, that moves in the plane according to a certain trajectory.

To specify the position of an object, a reference system must be chosen. For motion in a straight line a parameter is sufficient to specify the position of an object. In the plane we need two parameters, for example the coordinates \(x, y\) of a system of Cartesian axes. In the space we need three parameters. It’s useful to introduce the use of vectors; the position of an object located at a point \(P \) of the plane, with respect to the origin \(O \), can be defined by means of the position vector \(\mathbf{r}(t) \):

The **position vector** is a vector that begins at the origin of the reference system and ends at the point itself. In general, if the material point is moving, the position vector is a variable function over time. In the plane we can express the position vector as a function of the components with respect to the Cartesian axes:

The velocity of a material point moving in the plane is defined as the variation of the position vector per unit time. It’s computed through the derivative with respect to time:

\[ \mathbf{v}(t) = \frac{d \mathbf{r}(t)}{dt} \]Therefore, velocity is a vector and can be decomposed into its components along the Cartesian axes:

\[ \mathbf{v}(t)= \frac{d x(t)}{dt} \mathbf{i}+\frac{d y(t)}{dt} \mathbf{j} \]The acceleration of an object is defined as the variation of the velocity vector per unit time. Velocity is a vector, therefore there may be acceleration because the modulus of velocity changes or because only the direction changes (as in uniform circular motion). The mathematical definition of acceleration is as follows:

\[ \mathbf{a}(t)=\frac{d \mathbf{v}(t)}{dt} = \frac{d^{2}x(t)}{dt^{2}} \mathbf{i}+ \frac{d^{2}y(t)}{dt^{2}} \mathbf{j} \]Knowing the expression of the vector position over time, we can calculate the speed and acceleration by means of derivative operations. Vice versa, knowing the expression of acceleration as a function of time, we can calculate the velocity and the position by means of integration operations:

\[ \begin{split} \mathbf{v}(t) &= \int_{t_{0}}^{t} \mathbf{a}(t) dt \\ \mathbf{r}(t) &= \int_{t_{0}}^{t} \mathbf{v}(t) dt \\ \end{split} \]**Example 1.1 – Uniformly accelerated motion**

Suppose that a body moves with constant acceleration \(\mathbf{a} \), and that at time \(t = 0 \) it has velocity equal to \(\mathbf {v}_{0} \) and it’s found in the position \(\mathbf {r}_{0} \). By applying the two integration operations described above we obtain the expression for velocity and position as a function of time:

These vector equations can be projected along the Cartesian axes, obtaining separate equations for the components of the velocity and of the position vector.

**Example 1.2**

An example of uniformly accelerated motion is that of a body that falls under the action of terrestrial gravity. Neglecting the air resistance, as we will see later, the body falls downwards with constant acceleration \(a = g \), where \(g = 9.8 ms^{- 2} \) is the acceleration of gravity, which we can consider constant near the surface of the Earth.

Another example of uniformly accelerated motion is a projectile launched from the origin of a Cartesian coordinate system, with an initial velocity vector \(\mathbf{v} = (v_{0, x}, v_ {0, y} ) \). Assuming that only the gravitational force of the earth is present and neglecting the resistance of the air, the projectile follows a parabolic trajectory. The above equations allow to calculate the point and time in which the projectile falls back to the ground.

Dynamics studies the motion of objects taking into account the mass and making a complete analysis of the forces involved. The dynamic analysis allows to predict the motion of the objects on which the forces act.

Typical examples are the motion of objects on the earth’s surface under the action of gravity, the motion of the planets around the sun, or the calculation of the time required to stop a car that moves at a given speed.

This article summarizes the main concepts and equations of motion dynamics. However, for an in-depth study we refer to physics texts, among which the following ones are excellent: ^{[1]} or ^{[2]}.

The concept of force is fundamental in physics. In the study of mechanics, we can define a force as an action that tends to maintain or modify the motion of an object. There are different types of forces. A useful distinction is that between contact forces and forces at a distance. The main forces acting at a distance are:

- gravitational force
- electric force
- magnetic force

Contact forces act between objects that physically touch each other. Contact forces include the following:

- friction (e.g. a body that rolls or crawls on a road)
- air resistance
- reaction force (e.g. an object on a table)
- collisions between two particles

The contact force model is a useful approximation in solving many problems. However, strictly speaking, given the atomic structure of matter, there is never a real physical contact between bodies and all forces should be treated as distance forces, even if the distances are microscopic.

A force is a vector quantity, which is measured in **Newton** in the International System. Once a reference system has been fixed in the plane, a force can be decomposed into its components according to the direction of the coordinated axes:

The concept of force was introduced by Isaac Newton in his fundamental work ‘**Philosophiae Naturalis Principia Mathematica**‘, published on July 5, 1687, which describes his famous three laws of dynamics.

Newton’s three laws of motion are as follows:

**law of inertia**: in an inertial reference system, every body not subject to forces remains in its state of rest or uniform rectilinear motion**law of force**: \(\mathbf{F} = m \mathbf{a} \)**principle of action-reaction**: to every action corresponds an equal and opposite reaction

The first law had already been formulated by **Galileo** and requires defining exactly the properties of inertial reference systems. For these concepts, that are not simple but very profound, we refer to the physics texts in the bibliography.

The second law expresses the fundamental discovery that forces are the cause of accelerated motions. If a force \(\mathbf{F} \) is exerted on a mass object \(m \), then the body experiences an acceleration of the value

The force is responsible for the acceleration of a material body. Constant velocity movement is possible without the action of forces, contrary to what ancient philosophers, such as Aristotle, believed.

The third law implies that if an object exerts a force on another object (for example a book placed on the table exerts a pressure on the table), then the second object exerts an equal and opposite reaction on the first object (the table exerts an equal and opposite force on the book).

An important example of application of Newton’s laws is the calculation of the trajectories of planets and satellites in the solar system. The universal law of gravitation discovered by Newton states that, given two bodies of masses \(m, M \) respectively which are at a distance \(r \), a gravitational force of attraction is exerted between them. The intensity of the gravitational force is expressed by the following Newton formula:

\[ \mathbf{F} = G \frac{mM}{r^{2}} \mathbf{u_{r}} \]where \(G = 6,67 \cdot 10^{- 11} m^{3} (Kg)^{- 1} s^{- 2} \) is the famous **universal gravitation constant**. The vector \(\mathbf{u_{r}} \) is a unit vector, with direction equal to the radius vector joining the two material points.

Due to the small value of the constant \(G \), the gravitational force is very weak if the masses involved are not large. For objects that are close to the surface of the Earth, Newton’s formula can be simplified in this way:

where \(g = 9.8 ms^{- 2} \) is the constant value of acceleration and the direction is perpendicular to the surface of the earth. In essence, near the surface of the Earth the force of gravity does not depend on the distance between the body and the Earth; all bodies, of any mass, have the same acceleration, neglecting the resistance of the air and other possible forces.

**Example 2.1 – Motion of a bullet**

A bullet is launched from the origin of a Cartesian coordinate system, with an initial velocity vector \(\mathbf {v} = (v_{0, x}, v_{0, y}) \). Assuming that only the gravitational force of the earth is present and neglecting the resistance of the air, the projectile follows a parabolic trajectory. Newton’s laws allow to calculate the trajectory from a spatial and temporal point of view. The trajectory is a parabola whose parametric equations are as follows (suppose we shoot the bullet from the coordinate point \((x_{0}, y_{0}) \)):

Some important formulas derived from the equations of motion are the following (we indicate with \(v_{0} \) the modulus of velocity at time \(t = 0 \) and with \(\alpha \) the angle of the initial speed with the abscissa axis \(x \)):

\[ \displaystyle \begin{array}{l} \textbf {horizontal acceleration} = a_{x}=0 \\ \textbf {vertical acceleration} = a_{y}=-g \\ \textbf {horizontal velocity} = v_{x}(t) = v_{0,x} \\ \textbf{vertical velocity } = v_{y}(t) = v_{0,y} – gt \\ \textbf {maximum height} = \dfrac {v_{0}^{2} (\sin \alpha)^{2}}{2g} \\ \textbf {time to return to the ground} = \dfrac {2v_{0}\sin \alpha}{g} \\ \textbf{range} = \dfrac {v_{0}^{2} (\sin 2\alpha)}{g} \\ \end{array} \]**Exercise 2.1**

Prove that, neglecting the air resistance, the maximum range angle for a cannon is \(45^{\circ} \).

It is useful to mention some of the principles of conservation which, in addition to having a profound meaning from the physical point of view, are very useful in solving physics problems.

There are different types of energy: kinetic energy, potential energy, thermal energy, nuclear energy, etc. The sum total of all these types of energies remains constant in the universe; energy cannot be destroyed or created, but can only be transformed from one type to another. When studying a physical phenomenon it is useful to divide the universe into two separate parts:

- the system being studied
- the external environment

If these two systems exchange energy or mass then the energy of the system under study may not be conserved.

The principle of energy conservation is a general principle which applies without exception. In mechanical problems a more restricted principle applies, the principle of conservation of mechanical energy, which includes only the types of energy that affect the physical context, kinetic energy and potential energy.

Kinetic energy is energy due to the movement of objects. A body of mass \(m \) that moves at speeds of magnitude \(v \) has a kinetic energy given by the following formula:

\[ \text {Kinetic Energy} = K = \frac{1}{2}m v^{2} \]The dimensions of the kinetic energy are \(ML^{2}T^{- 2} \). The unit of measurement in the SI is the **Joule**: \(1\) Joule = \(1 Kg \cdot m^{2} \cdot s^{- 2} \).

Potential energy is a type of energy due to the possibility of doing work by moving a body from one position to another, in the presence of a force field.

A body has potential energy because it can do a work. For example, a body of mass \(m \) which is at height \(h \) above the earth’s surface it transforms its potential energy into kinetic energy, which can be used to do work. Vice versa, to bring a body from the earth’s surface to a height \(h \), some work must be done to win the force of gravity.

The formula for the potential energy of a body of mass \(m \) which is located at height \(h \) from the earth’s surface is as follows:

Gravitational force has a very important property: the work done to bring a body from one height to another does not depend on the path, but only on the starting point and the final point. In physics, gravitational force is said to be a **conservative force.****Mechanical energy** \(E \) is defined as the sum of kinetic energy and potential energy:

The **principle of conservation of mechanical energy** states that, in a closed system and in the absence of dissipative forces (such as friction or air resistance), the total mechanical energy in a conservative force field remains constant. So, the potential energy can turn into kinetic energy and vice versa, but the sum of the two remains constant.

Another important physical quantity is the **momentum** (or **quantity of motion**) of a material point of mass \(m \) that moves at speed \(\mathbf{v} \):

It is a vector quantity for which the following **conservation principle** applies: **in an isolated system, the total momentum of a set of material points is conserved.**

For example, in the collision between two material bodies, the total momentum before the impact is equal to the total momentum after the impact.

Another important quantity is the** **impulse** **which a body of mass \(m \) acquires when subjected to a force \(F \) for a certain interval of time \(\Delta t \). We use the symbol \(\mathbf {J} \) to denote the impulse. The fundamental law to remember is expressed by the following formula:

So the impulse applied by a force in the time interval \([t_{1}, t_{2}] \) is equal to the variation of the momentum of the mass \(m \) on which the the force acts.

For a deeper understanding of these fairly complex concepts, you may read the physics texts suggested in the bibliography.

Each GameObject that interact in the scene is identified by the cartesian coordinates (x, y, z) in 3D space, or by (x, y) in 2D. To simulate the real world it is necessary to apply the laws of physics to each object and make the necessary calculations to determine speed, acceleration, rotation frequency, result of impacts with other objects, etc.

In the most recent versions of Unity a component has been added to simulate the laws of object physics on the 2D scene, the **Physics2D** engine. Recall that the physical Unity 2D and 3D engines are completely separate. The 3D engine uses the PhysX software product, while the 2D engine uses Box2D.

The parameters of the 2D Physics engine are set using the Physics 2D manager (*Edit -> Project Setting -> Physics2D*).

The main components of Unity’s 2D Physics Engine are illustrated in the following diagram:

In Unity 2D collisions are not controlled directly by the Rigidbody 2D components, but by new components called Colliders 2D. These components define a region of the plane in which interaction between objects can occur. These regions generally have a different shape from the objects themselves, except in the case of simple geometric objects. The main Collider 2D components are shown in the following diagram:

The topic of collisions in two dimensions was illustrated in a previous article in this blog.

The Rigidbody 2D component allows to simulate physical interactions between objects. An object (for example a sprite) with which the Rigidbody 2D component is associated is placed under the control of the 2D physical engine. The Rigidbody 2D component defines physical properties such as gravity intensity, mass, friction, etc. Rigidbody 2D objects can only move in the XY plane and can rotate around the Z axis. There is the possibility of eliminating the effects of gravity for the entire scene (*Edit -> Project Settings -> Physics 2D)*, or for a single object, updating the Rigidbody 2D component.

There are three options for the **Body Type**, which define three different modes of behavior:

- Dynamic
- Kinematic
- Static

The Dynamic option implies that the object moves under the control of the physical engine, according to the properties defined in the Rigidbody 2D component (mass, drag, gravity scale, etc.).

The Kinematic option implies that the object moves in the context of the physical simulation, but control is left to the application. For an object with the Kinematic option, motion is not affected by mass, gravity or other types of forces as it is with the Dynamic option.

With the Static option an object does not move under physical simulation; in a collision it behaves like an object of infinite mass, which does not move.

In the next paragraph we will describe the possibilities that the Rigidbody 2D component offers to simulate the motion of objects in the 2D environment.

Physics Material 2D is a component that can be associated with a GameObject to define the physical characteristics of the object itself. Physics Material contains two properties: **Bounciness** (how much the object bounces after a collision) and **Friction** (friction). A correct choice allows to simulate with more precision the behavior of real objects in physical processes, for example in collisions with other objects.

Joints 2D are components that allow to define physical constraints between two objects, with varying degrees of freedom. This allows physics to be simulated even for objects made of various components, such as doors, vehicles, complex structures of various types. The main types of joints offered by Unity 2D are illustrated in the following diagram:

**Fixed Joint**– binds two objects rigidly, similar to a parent -> child relationship. The body with the Fixed Joint follows the movements of the other object. In the presence of obstacles, it may happen that it cannot keep the fixed distance but it can assume a behavior similar to a spring**Hinge Joint**– used for example to create a swing door. It is like a hinge joint, like a door hinge**Distance Joint**– allows you to establish a certain distance between two objects**Wheel Joint**– simulates the behavior of a wheel, to allow the creation of vehicles of various kinds**Slider Joint**– allows you to create a sort of track on which a body can slide**Spring Joint**– simulates the behavior of a spring

The way to create a Joint is similar for all types; only some specific parameters to be set change. The main steps to define a joint between two objects are the following:

- create an object and assign it the Rigidbody2D component
- assign a physical material to the object
- open the Joints menu (
*Component -> Physics2D*) - choose the type of Joint
- set the parameters in the Inspector
- set the
**Connected Body**field by dragging the second object to be connected

The following animation represents a Spring joint, which simulates the swing of a pendulum:

As is known, in Unity the position, rotation and scale information of the objects on the scene are normally managed through the **Transform component** associated with each object.

Using the Rigidbody2D component it is possible to put the objects under the control of the physical engine, which in turn can change the position and speed of the objects themselves. A fundamental task of the Rigidbody2D component is precisely to manage the communication with the Transform component in order to make the position data always updated correctly. The Rigidbody2D component offers some properties and functions that allow you to control the motion of a body, simulating a force applied to the object:

- Rigidbody2D.velocity
- Rigidbody2D.MovePosition
- Rigidbody2D.AddForce, ForceMode2D = Force
- Rigidbody2D.AddForce, ForceMode2D = Impulse

The **AddForce** function accepts two parameters: the first denotes the vector (Vector2D) which represents the applied force and the second denotes the type of thrust (**ForceMode2D**). There are two types of thrusts:

**ForceMode2D.Impulse**(force applied for a short time, for example to make a jump)**ForceMode2D.Force**(it’s a normal force, like pushing to move a wagon or car engine)

The functions that use the physical engine must be put in the FixedUpdate method, since that is the frequency with which Unity updates the calculations for the physics simulation.

This function applies an instantaneous speed change of the Rigidbody2D, ignoring its mass.

To understand the difference with AddForce, suppose we have an object that moves in a certain direction. If we apply a force in a direction perpendicular to the direction of motion, the object will not make a sudden change of \(90^{\circ} \), but the change of direction will take place gradually, depending on the intensity of the force. At each time interval, the vector sum is made between the speed vector and the force vector, calculating the new direction of motion. Instead, applying the Rigidbody2D.velocity function, there is an instant change of direction. For example if an object is moving along the positive direction of the \(X \) axis, by executing the following statement:

`rigidbody2D.velocity = Vector2.up;`

we will have a sudden change of motion towards the positive direction of the \(Y \) axis.

The following image illustrates the movement of a ball thrown to hit a target object that moves after each collision. The initial speed is calculated each time, based on the laws of dynamics, to hit the target. The ball is associated with the Rigidbody2D component with the Body Type set to Dynamic.

A rough outline of the instructions for calculating the projectile’s launch speed is as follows:

```
void StartBullet() {
Vector2 m_veloStart = ComputeInitialSpeed(transform.position,
target.transform.position);
m_Rigidbody2D.velocity = m_veloStart;
}
// compute initial bullet speed
public Vector2 ComputeInitialSpeed(Vector2 origin, Vector2 target) {
float gravity = Physics2D.gravity.magnitude;
float distance = Mathf.Abs(target.x - origin.x);
float maxHeight = distance / 4;
float startSpeedY = Mathf.Sqrt(2.0f * gravity * maxHeight);
float time = (startSpeedY + Mathf.Sqrt(startSpeedY * startSpeedY
- 2*gravity * target.y))/gravity;
float startSpeedX = distance / time;
// choose direction
if (target.x - origin.x > 0.0f){
// right direction
return new Vector2(startSpeedX, startSpeedY);
}
// left direction
else {
return new Vector2(-startSpeedX, startSpeedY);
}
}
```

In addition to the Rigidbody2D.velocity property, there is also another function that you can use to change the position of an object: **Rigidbody2D.MovePosition**. The MovePosition method has a similar effect, except that the Rigidbody2D must be set to Kinematic. In this case the position is updated at each fixed frame and the speed is calculated internally by the physical engine. The MovePosition function is useful for chasing another moving object. The following animation illustrates an object chasing a target that moves in circular motion.

The instructions for chasing the target object are as follows (the “Gravity scale” option is set to zero):

```
private void FixedUpdate() {
Vector2 direction = (target.transform.position -
transform.position).normalized;
if (Vector2.Distance(target.transform.position, transform.position)
> minDist) {
m_Rigidbody2D.MovePosition(m_Rigidbody2D.position + direction
* speed * Time.fixedDeltaTime);
}
else {
Debug.Log("Target catched ");
}
}
```

With the AddForce function the force is applied during each fixed interval of time (timestep) by the physical engine. The default value is 0.02 seconds, but it can be modified through the editor (*Edit -> Project -> Settings -> Time*).

The AddForce method with the **ForceMode2D.Force** parameter applies a gradual continuous force on the object during the frame period and takes into account the mass.

The AddForce method with the **ForceMode2D.Impulse** parameter applies instantaneous force to the object and takes mass into account. The impulse lasts for the entire frame interval. It is used, for example, to simulate explosions.

For the complete list of properties of the Rigidbody component see the relative section of the Unity manual.

In the following animation we simulate the motion of a planet around the sun. The universal gravitation law is applied with appropriate scaling operations of the quantities involved. Both objects have the Rigidbody2D component, with type Dynamic. The Gravity Scale is set to zero.

```
private void FixedUpdate() {
Vector2 appliedForce = Vector2.zero;
Vector2 distance = (target.transform.position - transform.position)
* distanceScale;
float squaredDist = distance.sqrMagnitude;
// use Newton's law of universal gravitation
float gravityForce = G * m_Rigidbody.mass *
m_RigidbodyTarget.mass / squaredDist;
appliedForce = gravityForce * distance.normalized;
m_Rigidbody.AddForce(appliedForce, ForceMode2D.Force);
}
```

Finally, we mention another function made available to rotate the plane: **RigidBody2D.AddTorque.** Torque is the moment that a force exerts on a body with respect to a reference point. The moment of a force applied to a material point which is in the position \(P \), with respect to an initial point \(O \), is defined with the vector product:

where \(\mathbf {r} \) is the radius vector \(\mathbf {OP} \) that goes from the origin point \(O \) to the position \(P \) where the material point is located. The moment vector is perpendicular to the plane formed by the position vector and the force vector. The intensity of the torque vector is

\[ |\mathbf{M}| = |\mathbf{r} \times \mathbf{F}|= rF \sin \theta \]

The moment is the cause of the rotation of a body. The topic of rotations in the plane will be discussed in a subsequent article.

Many video games for mobile or even desktop devices are still developed in the 2D environment. It’s therefore important to know the features that the Unity platform offers, and among these in particular the potential of the 2D Physics Engine to simulate motion and collisions between objects.

^{[1]}Halliday, Resnick – Fundamentals of Physics (Wiley)

^{[2]}R. Feynman – The Feynman Lectures on Physics (Basic Books)

The post Motion in a Plane and Unity’s 2D Physics Engine appeared first on GameLudere.

]]>The post Iterated Function Systems, Fractals and Sierpinski Triangle appeared first on GameLudere.

]]>Each fractal is a geometric shape that we can consider immersed in an Euclidean space \(\mathbb {R}^{n}\) of size \(n = 1,2,3\) or even larger. To define the mathematical space of fractals it’s necessary to recall some concepts on the topology of metric spaces.

**Definition 1.1**

A metric space \((X, d) \) is a pair formed by a set \(X \) and a function \(d: X \times X \to \mathbb {R} \), called **distance function**, which satisfies the following axioms:

**Example 1.1 – The discrete metric**Given a set \(X \), we define the following metric:

It’s easy to verify that the defined function, called discrete metric, satisfies the axioms and therefore the pair \((X, d) \) is a metric space.

**Example 1.2 – The Euclidean metric**

Let \(X = \mathbb {R ^ {n}} \) be the Euclidean space with \(n \) dimensions. If \(x, y \) are two points in space, with coordinates \((x_{i})\) and \((y_{i})\), the Euclidean metric is defined as follows:

**Example 1.3 – The rational and irrational numbers**We have seen that the set of real numbers \(\mathbb {R} \) is a metric space with the metric \(d (x, y) = | x-y | \). As is known, real numbers are made up of rational numbers and irrational numbers; therefore \(\mathbb {R} = \mathbb{Q} \cup \mathbb{I} \), where \(\mathbb {Q} \) is the set of rational numbers and \(\mathbb {I} \) is the set of irrational numbers. These two subsets are metric spaces with Euclidean metric.

**Example 1.4 – The Manhattan metric**

In the plane \(\mathbb{R^{2}} \) in addition to the Euclidean metric we can define the following metric, called the Manhattan metric, or Taxicab metric. Living in a city where the streets have a grid configuration, north-south and east-west, to go from a point \(A \) to a point \(B \) by taxi, you cannot take the shortest road according to the Euclidean metric, but you must travel only horizontally and vertically. In this situation, given two points \(A(x_ {1}, y_ {1}) \) and \(B (x_{2}, y_{2}) \) of the plane, the minimum distance to travel from the point \(A \) to the point \(B \) is given by the following value:

To prove that the function defined above satisfies the axioms of metric spaces, remember that given any two real numbers \(x, y \) the following relation holds: \(| x + y | \le | x | + | y | \).

**Definition 2.1**

A **topological space **is a pair \((X, A) \) consisting of a set \(X \) and a family of subsets of \(X \), denoted by \(A \), called the **open sets**, which satisfies the following axioms:

- A1 – the universe set \(X \) and the empty set \(\emptyset \) are open
- A2 – the union of any finite or infinite number of open sets is an open set
- A3 – the intersection of a finite number of open sets is an open set

The complement of an open set is called a **closed set**. The definition of topological space does not presuppose any algebraic structure on the set \(X \). To define a topological space, therefore, we only have to define which subsets are open.

**Example 2.1 – The trivial or indiscrete topology**Let \(X \) be any set. The trivial (or indiscrete) topology on \(X \) is the topology in which \(A = \{\emptyset, X \} \), that is the open sets are only the trivial subsets.

**Example 2.2 – The discrete topology**Let \(X \) be any set. Discrete topology defines all subsets of \(X \) as open. The family of all subsets of \(X \) is called the power set of \(X \), denoted by \(P (X) \).

**Exercise 2.1**

Prove that, if \(| X | = n \), then the number of open sets in the discrete topology is \(2^{n} \).

On every metric space \((X, d) \) we can define a topology induced by the metric.

**Definition 3.1**

Given a metric space \((X, d) \), we define the **open sphere** (or **open ball**) of radius \(r \) and center \(x \), the set

We now define open all subsets of \(𝑋\) which are union of open spheres. It’s easily proved that, with this definition of open sets, the set \(𝑋\) assumes a topological space structure.

**Example 3.1**

The Euclidean topology of \(\mathbb{R}^{n}\) is that induced by the Euclidean metric:

**Definition 3.2 – Limit of a sequence**

A sequence of points \(\{x_{n}\}\) of a metric space \((𝑋, 𝑑)\) converges to a point \(x \in X\) (or equivalently has limit at a point \(x \in X\)) as \(x\) goes to infinity, if the following condition holds:

In this case we use the following notation:

\[ \lim_{n \to \infty} {x_{n}} = x \]**Definition 3.3 – Cauchy sequence**A sequence of points \(\{x_{n} \} \) of a metric space \((X, d) \) is called Cauchy sequence if the terms of the sequence eventually become arbitrarily close to one another; that is

**Exercise 3.1**

Prove that every convergent sequence is also a Cauchy sequence.

**Definition 3.4 – Complete metric space**

A metric space \((X,d)\) is said to be complete if each Cauchy sequence \(\{x_{n} \} \) is convergent to a point in \(X \).

**Example 3.2**

The open interval \(X = (0,1) \) of the real line is a metric space but it’s not complete. For example, the sequence \(\{x_ {n} = \frac {1}{n} \} \) is a Cauchy sequence but doesn’t converge to a point in the interval.

However, the closed interval \([0,1] \) is a complete metric space.

**Definition 3.5 – Compact set**

Let a metric space \((X, d) \) be given. A subset \(C \) of \(X \) is said to be **compact** if each infinite sequence \(\{x_{n} \} \) contains a subsequence that has limit belonging to \(C \).

The following fundamental theorem of mathematical analysis holds:

**Theorem 3.1 – Heine-Borel**A subset of the Euclidean space \(R^{n} \) is compact if and only if it is closed and bounded.

For a deeper study of Topology and Metric Spaces you can consult ^{[1]}. Like all the texts in the Schaum’s series, it’s characterized by the presence of numerous solved exercises.

At this point we have all the tools necessary to define the metric space of fractals.

Suppose we have a complete metric space \((X, d) \). In every metric space we have primitive elements, called points of space. Furthermore, in a metric space we have defined the class of compact subsets, which can consist of a finite or an infinite number of points.

**Definition 4.1 – The fractal space**

Given a metric space \((X, d) \), the **fractal space** \(H (X) \) is a space whose points are the compact subsets of the metric space \(X \), excluding the empty set.

To complete the definition we have to define a distance function that provides the structure of metric space.

**Definition 4.2**

Let \((X, d) \) be a complete metric space. Let \(x \in X \) and \(C \) be a compact subset of \(X \), that is a point of space \(H (X) \). We define the **distance** of the point \(x \) from the set \(C \):

From mathematical analysis we know that the minimum value exists on the basis of the hypothesis of compactness of the non-empty set \(C \).

At this point we can define the metric on the fractal space \(H (X) \):

**Definition 4.3**

Let \((X, d) \) be a complete metric space. Given two compact sets \(A, B \in X \) (in fact two points of space \(H (X) \)), we define the **distance between the two points** \(A, B \) of space \(H (X) \) as follows:

The function defined above does not satisfy the symmetry property. In fact in general \(d (A, B) \neq d(B, A) \). So the function \(d \) does not represent a metric on space \(H (X) \). To have a metric, the following definition proposed by the mathematician **Felix Hausdorff **(1868-1942)** **is used:

**Definition 4.4**

The **Hausdorff distance **\(h(A, B) \) between two points \(A, B \) of the fractal space \(H (X) \) is given by the following formula:

where max denotes the larger of the two values. It can be shown that the function defined above satisfies the axioms of distance. Furthermore, if the metric space \(X \) is complete, the metric space \(H (X) \) is also complete. For a proof see Barnsley’s book.

In the plane \(R^{2} \) with a fixed origin \(O \), to each point of the plane corresponds the vector \(\mathbf {OP} \). The notation \(P + Q \) refers to the point of the plane identified by the vector sum of \(\mathbf {OP} \) and \(\mathbf {OQ} \). A **linear transformation **\(f \) in the plane is a function which associates to each point \(P (x, y) \) of the plane another point of the plane \(f (P) \), such that if \(P , Q \) are any two points of the plane and \(\lambda \) is a real number, the following properties are verified:

In the plane \(\mathbb {R ^ {2}} \) a linear transformation can be represented with the following system of two linear equations:

\[ \begin{cases} y_{1} = a_{11} x_1 + a_{12} x_2 \\ y_{2} = a_{21} x_1 + a_{22} x_2 \\ \end{cases} \]The system can be written in compact form as \(y = Ax \), where \(A \) is the matrix \(2 \times 2 \) with coefficients \(a_ {ij} \), while \(x, y \) are two column vectors. Recall that a matrix \(A \) is said to be invertible if there is another matrix, denoted by \(A^{- 1} \) such that \(AA^{- 1} = I \), where \(I \) is the identity matrix, whose diagonal elements are all equal to \(1 \) and the others are equal to zero.

**Definition 5.1**

An **affine transformation** is a transformation of the form \(f (x) = Ax + t \), where \(A \) is an invertible matrix, and \(x, t \) are vectors.

In fact an affine transformation is the composition of a linear transformation and a translation. The vector \(t \) is responsible for the translation. In the case of the space \(X = R^{2} \), with \(t=(e,f)\), the affine transformation can be written like this:

\[ f(x) = Ax + t= \left(\begin{array}{ll} a & b \\ c & d \\ \end{array}\right) \left(\begin{array}{l} x_1 \\ x_2 \end{array}\right) + \left(\begin{array}{l} e \\ f \end{array}\right) \]Fractal objects are typically subsets of the Euclidean space \(\mathbb {R^{n}} \), generated by affine transformations. It’s of fundamental importance to study the geometric properties that remain unchanged with respect to an affine transformation.

An affine transformation is a composition of rotations, translations, dilations and contractions. Affine transformations preserve collinearity (i.e. points that initially belong to a straight line, remain on the straight line) and the ratio of distances (midpoint of a line segment remains the midpoint after transformation). However, they do not preserve the angles and lengths.

The affinity relationship is an extension of the congruence and similarity relationships studied in classical Euclidean geometry. A triangle always remains a triangle after an affine transformation, even if it doesn’t necessarily have the same area (congruence) or is similar to the initial one.

To understand the affine transformations it’s necessary to remember the most important properties of the algebra of vectors and matrices. For this you can consult the following text, which proposes many solved exercises: ^{[2]}. See also the articles already published in this blog on the topics of algebra of vectors and matrices.

**Exercise 5.1**

Given an affine transformation \(f (x) = Ax + t \) find the inverse transformation \(g (x) \) such that \(f (g (x)) = x \).

Hint

Multiply left and right by the inverse matrix \(A^{- 1} \).

**Theorem 5.1**

A composition of two affine transformations is an affine transformation.

Proof

Let \(f (x) = Ax + t \), \(g (x) = Bx + s \) be two affine transformations. Then \((g \circ f) (x) = g(f (x)) = (BA)(x) + (Bt + s) \). Since \(A, B \) are invertible matrices, the product matrix \(BA \) is also invertible, therefore the composite function \(g \circ f \) is an affine transformation.

An affine transformation in the plane can be represented compactly using homogeneous coordinates, that is a matrix \(3 \times 3 \). In fact we add a column with the parameters related to the translation and a row with the values \((0,0,1) \):

\[ Ax + t= \left(\begin{array}{lll} a & b & e\\ c & d & f\\ 0 & 0 & 1 \end{array}\right) \left(\begin{array}{l} x \\ y \\ 1 \end{array}\right) \]For the transformations in the space \(R^{3} \) we will use a matrix \( 4 \times 4 \).

The matrices for the basic transformations are the following:

**Translation \((d_{x}, d_{y}) \)**:

**Rotation by an angle \(\theta\)**:

**Scaling \((s_{x}, s_{y}) \)**:

**Deformation (shear)**:

**Example 5.1**

The following matrix represents a translation of one unit in the direction of the \(x \) axis and two units in the direction of the \(y \) axis. The dimensions are not changed and there is no rotation.

**Exercise 5.2**

Determine the effect of the transformation in the plane using the following matrix [\(\cos (30^\circ) = \sqrt \frac{3} {2} ,\sin (30^\circ) = \frac{1} {2} \)]:

**Definition 6.1**

Given a metric space \((X, d\) ) and a function \(f: X \to X \), the function is said to be a **contraction **on the metric space if there exists a real number \(0 \le c <1 \) such that

**Definition 6.2 **

Given a function \(f: X \to X \), a point \(x_{0} \in X \) is called a **fixed point **for the function if it results \(f (x_{0}) = x_{0} \).

**Exercise 6.1 **

Consider the metric space \((R^{2}, d) \), where \(d \) is the Euclidean metric. Let the transformation \(f: \mathbb{R^{2}} \to \mathbb {R^{2}} \) be given as follows:

Prove under which conditions of the coefficients \(a, b, c \) it’s a contraction and in this case find the fixed point.

Answer: \(\left[ a, c> 1; x = \dfrac{ab} {a-1}, y = \dfrac{bc} {c-1} \right]\)

For contractions the following fundamental theorem of **Banach** (1892-1945) and **Caccioppoli **(1904-1959) applies:

**Theorem 6.1 – Banach-Caccioppoli**

Let \((X, d) \) be a complete metric space and \(f: X \to X \) a function of contraction on the space. Then the function \(f \) has a single fixed point in \(X \). Also, for each point \(x \in X \) the sequence

converges to the fixed point of \(f \).

For the proof see a text of mathematical analysis; for example ^{[3]}.

**Example 6.1**

An example of application of this theorem is the search for a solution of the equation \(f (x) = 0 \). First we write the equation in the form \(x = T (x) \), then we define an iteration scheme \(x_ {n + 1} = T (x_{n}) \), starting from an initial value \(x_ {0} \). Under suitable conditions the sequence tends to a fixed point of \(T (x) \), that is to a zero of the function \(f (x) \).

An important example is **Newton’s algorithm **for finding the roots of a function. Suppose we want to find the solutions of the function \(x^{2} – a = 0 \). We write the equation in the form \(x = \frac{1}{2} (x + \frac {a} {x}) \), and set the iterative scheme:

It can be shown that by appropriately choosing the starting point \(x_ {0} \) the sequence converges to the value \(\sqrt {a} \), which is the fixed point and also the root of the initial equation.

As we have previously seen, an affine transformation in the Euclidean space \(R^{2}\) which transforms a point \((x_{n}, y_{n}) \) into the point \((x_{n + 1} , y_{n + 1}) \) can be described with the following equations:

\[ \begin{split} x_{n+1} = a x_{n} + by_{n}+ e \\ y_{n+1} = c x_{n} + dy_{n}+ f \\ \end{split} \]The parameters \(a, b, c, d \) produce a rotation and a change of scale according to their size; the parameters \(e, f \) perform a translation of the point. If we take some points of the border of any object, and apply these functions to each of the points, we can obtain a transformation of the whole figure, made of rotations, translations and changes of scale. The new figure will generally have the property of auto-similarity with the original figure, one of the characteristic properties of fractals.

One of the most common ways to generate fractal images is to use an attractor set of an iterated function system (IFS). Each IFS is made up of affine transformations with rotations, translations and changes of scale. Normally two types of algorithms are used, one deterministic and another probabilistic.

For an in-depth study of IFS, see Barnsley’s fundamental text ^{[4]}.

**Definition 7.1**

A deterministic iterated function system (IFS) is a collection of affine transformations \(\{T_{1}, \cdots T_{n} \} \):

The following theorem (see Barnsley) is fundamental:

**Theorem 7.1**

Let’s consider an iterated function system \(\{H (X); T_{1}, T_{2}, \cdots, T_{n} \} \), with contraction factor \(c \). For each set \(A \in H(X) \) we define the union set

Then:

- \(W(A) \) is a contraction function with factor \(c \)
- there is a single fixed point \(K \in H(X) \) such that \(K = W(K) = \cup T_{i} (K) \)

The fixed point \(K \) is obtained as an infinite limit of the iteration process applied to the set \(A \).

The deterministic algorithm takes a set of points \(A \), for example the points of a geometric figure, to which it applies the \(n \) affine transformations, thus obtaining a new set consisting of the union of \(n \) sets of points:

\[ W(A)= T_{1}(A) \cup T_{2}(A) \cup \cdots \cup T_{n}(A) \]The operator \(W \) is called the **Hutchinson operator**.

For example, if the initial set \(A \) consists of only one point, after the first iteration the set \(W (A) \) contains \(n \) points. The iteration process is continued starting from the set \(W (A) \) until the union of all the sets obtained in the last iteration approaches the figure that constitutes the** attractor of the IFS system**. Each IFS has associated a fractal attractor, which is generally independent of the initial choice of points.

As we mentioned earlier, there is also a nondeterministic version of the IFS algorithm. We define a probability distribution for the various transformations, and at each iteration, instead of applying all the transformations, we choose a transformation based on its probability.

**Definition 7.2**

A non-deterministic iterated function system is a collection of affine transformations \(\{T_{1}, \cdots T_{n} \} \), associated with a probability distribution \(\{P_{ 1}, \cdots P_{n} \} \):

Instead of applying functions to a set of points, the execution of a random iterated function system begins by choosing an initial point on the plane. Then one of the \(n \) transformations of the system is chosen randomly, based on the individual probabilities of each iteration. This process is repeated theoretically indefinitely. Of course, in practice, when generating the graphic image we stop after a finite number of iterations. After a certain number of iterations, the set of generated points becomes arbitrarily close to the attractor set of the IFS system.

The Sierpinski triangle (Sierpinski gasket) is a geometric figure proposed by the Polish mathematician **W. Sierpinski **(1882-1969), which requires the following steps for its construction:

- start with an equilateral triangle, indicated with \(A_{0} \), and identify the midpoints of the three sides
- the three midpoints are connected to each other and the triangle that is created is eliminated, not including its border; the remaining figure is indicated with \(A_{1} \)
- on each of the three remaining triangles the procedure is repeated, obtaining \(A_{2} \), and so on to infinity

The iterative process creates an infinite succession of sets:

\[ A_{0} \supset A_{1} \supset A_{2} \supset \cdots A_{N} \cdots \]The Sierpinski triangle is the set of points that remain after the procedure is repeated indefinitely.

The initial image is subjected to a set of affine transformations; it’s therefore an iterated function system. In each phase, three blue triangles and a white triangle are created from each blue triangle. Each of the new triangles has the perimeter equal to half of the parent triangle and the area equal to a quarter of that of the parent triangle. The sequence of sets \(\ {A _{0}, A_{1}, A_{2}, \cdots }\) is a sequence of compact sets, since in each step we eliminate the open triangle in the center. After an infinite number of iterations the image converges to the attractor. The system of equations that define the Sierpinski triangle is as follows:

You can also change the procedure, creating a right triangle with three vertices \(\{(0,0), (1,0), (0,1) \} \):

\[ \begin{split} \\ T_{1}(x,y)&= \left(\frac{x}{2}, \frac{y}{2}\right) \\ T_{2}(x,y)&= \left(\frac{x}{2} + \frac{1}{2},\frac{y}{2}\right) \\ T_{3}(x,y)&= \left(\frac{x}{2},\frac{y}{2} + \frac{1}{2}\right) \end{split} \]The functions \(\{T_{1}, T_{2}, T_{3} \} \) are affine transformations, translations and changes of scale. We call \(A_{0} \) the initial image of the triangle with the vertices \((0,0), (1,0), (0,1) \). After the first iteration we get the image

\[ A_{1}= T(A_{0})=T_{1}(A_{0}) \cup T_{2}(A_{0}) \cup T_{3}(A_{0}) \]where the transformation \(T\) is the union of the three elementary transformations. Continuing we have \(A_{2} = T(A_{1}) = T^{2} (A_{0})\), etc. The Sierpinski triangle is the limit of this succession of compact sets:

\[ \text {Sierpinski Triangle} = \lim_{n\to\infty} T^{n}(A_{0}) \]To compute the area and the perimeter of the Sierpinski triangle we use the following table:

From the table we can conclude that the searched values of the area and the perimeter of Sierpinski triangle are the following:

\[ \begin{split} \text {Perimeter} &= \lim_{n\to\infty} \left(\frac{3^{n+1}}{2^{n}}\right)= \infty \\ \text {Area} &= \lim_{n\to\infty} \left(\frac{3}{4}\right)^{n} \cdot A_{0}=0 \\ \end{split} \]In the case of the Sierpinski triangle, at each iteration we have \(N = 3^{n} \) triangles while the length of the sides of the triangles is \(L = \dfrac{1} {2^{n}} \). So, we find the following value for the dimension:

\[ d = \lim_{n\to\infty} \frac{\ln 3^{n}}{\ln 2^{n}} = \frac{\ln 3}{\ln 2} \approx 1,58496 \]The dimension can also be calculated according to another procedure. Since the Sierpinski triangle is self-similar with three disjoint copies of itself, each scaled by the factor \(r = \frac {1}{2} \), the dimension \(d \) of the attractor can be calculated by solving this equation:

\[ r^{d} + r^{d} + r^{d}= 3 \left(\frac{1}{2}\right)^{d}=1 \]For a proof see Barnsley’s book.

As we have seen in this article, fractal science rests on solid foundations, consisting of consolidated sectors of mathematics, such as analysis, topology, geometry, etc. Computer generated fractals are very beautiful and interesting objects attracting many people. However, this is only one aspect; fractal science is also a very important tool for describing complex systems and processes, both natural and artificial.

^{[1]}S. Lipschutz – General Topology (McGraw-Hill)

^{[2]}S. Lipschutz – Schaum’s Outline of Linear Algebra (McGraw Hill)

^{[3]}A. Kolmogorov, S. Fomin – Elements of function theory and functional analysis (Editori Riuniti)

^{[4]}M. Barnsley – Fractals Everywhere (Academic Press)

The post Iterated Function Systems, Fractals and Sierpinski Triangle appeared first on GameLudere.

]]>The post Dirichlet’s Box Principle and Ramsey Numbers appeared first on GameLudere.

]]>This article illustrates Dirichlet’s principle and gives a brief introduction to Ramsey’s theory, with some examples of computation of Ramsey numbers. For a deeper understanding of the topics of this article you can see

**Theorem 1.1 – Basic box principle**Let \(X \) be a set of \(n \) elements divided into \(r \) disjoint subsets. If \(n \gt r \) then at least one of the subsets contains more than one element.

Although it’s a very simple principle, it’s useful in many situations to solve even difficult problems.

**Exercise 1.1**

Let \(X = \{a_{1}, a_{2}, \cdots, a_{n} \} \) be a set of \(n\) integers. Prove that it’s always possible to choose a subset of \(X \) such that the sum of its numbers is divisible by \(n \).

Solution

Consider the following numbers: \(s_{1} = a_{1} \), \(s_{2} = a_{1} + a_{2} \), \(s_{n} = a_{1} + a_{2} + \cdots a_{n} \). Now, if any of the \(s_ {i} \) is divisible by \(n \) the problem is solved. Otherwise, dividing each sum by \(n \) the remainder can be \(\{1,2, \cdots (n-1) \} \). From this complete the proof, using the box principle.

**Exercise 1.2**

Given \(n + 1 \) integers, show that you can always choose two of them whose difference is divisible by \(n \).

**Exercise 1.3**

Let \(X\) be any set of \(r \) integers. If \(n \gt 1 \) and \(2^{r} \gt n + 1 \), prove that two distinct subsets of \(X \) can always be found such that the sums of their integers are congruent modulo \( n \), that is their difference is a multiple of \(n \).

Hint

Remember that given a set of \(r \) integers, the total number of non-empty subsets is \(2^{r} -1 \).

The condition \(2^{r} \gt n + 1 \) cannot be lowered. If \(2^{r} = n + 1 \) there is at least one set for which the property is no longer true. For example \(X = \{1,2,2^{2}, 2^{3}, \cdots 2^{r-1} \} \).

**Exercise 1.4**

Suppose we have a regular hexagon with a side equal to \(1 \). If there are \(7 \) points inside the surface of the hexagon, then there are at least two points whose distance is less than or equal to \(1 \).

**Exercise 1.5**

There are \(8 \) people sitting on an octagonal table. In each place there is a card with a person’s name (the names are all different). At first, all people sit in the wrong place. Prove that it’s always possible to rotate the table so that at least two people are in the right place.

Solution

For each person sitting around the table we calculate the distance from the card with his name, assuming a direction of rotation, counterclockwise for example. Possible values are \(\{1,2,3,4,5,6,7 \} \), since initially people were sitting in the wrong place. We can therefore apply the box principle: there are \(8 \) people and \(7 \) possible values of distances. For the box principle two persons must be at the same distance from their card. By making a rotation equal to this distance, the two persons will find themselves in front of their card.

**Exercise 1.6**

Consider the lattice of points with integer coordinates defined in the Cartesian plane. Prove that given any \(5 \) points of the lattice, the intermediate point of at least one pair is part of the lattice.

The intermediate point is defined by making the arithmetic averages of the coordinates of the two points.

**Theorem 2.1**

Suppose we have a set \(X \) of \(n\) objects, with \(n \gt mk \). If the set is divided into \(k \) disjoint subsets, then at least one of the subsets contains more than \(m \) elements.

Proof

Suppose \(X\) is partitioned into \(k\) disjoint sets:

Then, if it were \(| X_{i} | \le m, \forall i: 1\le i \le k \), we would have:

\[ n = | X | = \sum_{i = 1}^ {k} | X_{i} | \le mk \]contrary to the hypothesis that \(| X | = n \gt mk \).

**Exercise 2.1**

Show that, in a group of \(6\) people, at least \(3\) know each other or at least \(3\) don’t know each other.

Proof

Suppose that the \(6 \) people sit at the vertices of a hexagonal table. For each of the 15 pairs we draw a segment and assign a color: red if the people of the pair know each other, blue otherwise. From the geometric point of view, we have to prove that whatever colouring we make it’s always possible to find a triangle with only red sides or with only blue sides.

We choose any person in the group, denoting it with P. There are \(5\) people left; for Dirichlet’s principle two cases are possible:

- P knows at least three people
- at least three people don’t know P

Suppose the first case (the second is treated in a similar way). We denote the three people with A, B, C, and we draw red segments between P and A, B, C. Now, if there is a red segment between two people among A, B, C we are done; for example, if there is a segment between A and C we have the red triangle ACP. If the segments between A, B, C are all blue instead, we have the blue triangle ABC then.

Ramsey theory studies the properties that remain preserved in a partition of a given set. An equivalent way to define it is the following: if a set \(X \) has a given property \(P \) and is then decomposed into subsets, at least one of these retains the property \(P \) (*“ There is no complete disorder“*).

To introduce Ramsey theory it’s necessary to remember some definitions on graphs and on the coloring of graphs.

**Definition 3.1**

A graph is an ordered pair \(G = (V, E) \), where \(V \) is the set of elements called **vertices** or **nodes** and \(E \) is made up of a subset of the Cartesian product \(V \times V \), that is a set of pairs of vertices. The elements of \(E \) are called **edges** or **arcs.**

A graph is called **simple** if it has no loops, i.e. edges that start and end on the same vertex, or multiple edges between two vertices. The **order of a graph** is the number of vertices, i.e. the elements of \(V\). The **degree of a vertex** is the number of edges that are incident to the vertex. Two edges are called adjacent if they have a common vertex. Two vertices are called adjacent if there is an edge connecting them.

A graph is called **directed** if the pairs of vertices are ordered, otherwise it’s called undirected. In this article we only consider undirected graphs.

**Definition 3.2**

An undirected graph with \(n \) vertices is said to be **complete**, and is denoted with \(K_ {n} \), if each vertex is adjacent to all the other vertices.

Some examples are as follows:

**Exercise 3.1**

Prove that the number of edges of a complete graph of order \(n \) is

**Exercise 3.2**

Prove that in an undirected graph \(G\) the sum of degrees of the vertices is equal to twice the number of the edges. Furthermore, the number of vertices of odd degree is even.

**Definition 3.2**

An **r-coloring** of the vertices of the graph \(G=(V, E) \) is a function \(f: V \to \{1,2, \cdots r \} \). The definition is similar for the r-coloring of the edges.

Ramsey theory is concerned in particular with the 2-coloring of the edges of a graph. By convention the colors red and blue are used.

**Definition 4.1**

Let \(s, t \) be two positive integers. The **Ramsey number **\(R (s, t) \) is defined as the order of the smallest complete graph which, if colored with two colors red and blue, contains a complete red-colored subgraph \(K_{s} \), or a complete blue-colored subgraph \(K_{t} \).

We give Ramsey’s basic theorem for two colors:

**Theorem 4.1 – Ramsey**

Given two positive integers \(s, t \), there exists a minimum integer \(n \), dependent on \(s, t \) and denoted with \(n = R(s, t) \), such that each 2-coloring of the arcs of a complete graph \(K_{n} \) of order \(n\), contains a complete subgraph of order \(s \), whose edges are all red, or contains a complete subgraph of order \(t \), whose edges are all blue.

Furthermore, if \(s, t \ge 2 \) then the following relation holds:

Ramsey’s basic theorem guarantees the existence of the number \(R (s, t) \) for each pair of positive integers \(s, t \). In a subsequent article we will illustrate in detail the theorem, with some of its extensions and applications. For a deeper study of Ramsey theory see ^{[2]}.

**Exercise 4.1**

Prove the following relationships:

**Exercise 4.2**

Prove that \(R (3,3) = 6 \). In other words, prove that for the complete graph \(K_{6} \) any coloring of the arcs with two colors red and blue, contains at least one triangle with all red or all blue sides.

Also, show that for a graph \(K_{5} \) you can always find a coloration that doesn’t contain any triangle with sides of the same color.

By exercise 2.1 we already know that \(K (3,3) \le 6 \). It remains to be shown that \(K (3,3) \gt 5 \). For this it’s sufficient to analyze the coloring of the following complete graph \(K_{5}\).

**Exercise 4.3**

Prove that each coloring with two colors of a complete graph \(K_{6} \) contains at least two monochrome triangles.

Solution

We already know that there is at least one monochromatic triangle, let’s say red, with three vertices \(v_{1}, v_{2}, v_{3} \). If the triangle \(v_{4}, v_{5}, v_{6} \) is also red, we are done. Otherwise, we assume that the arc \((v_{4}, v_{5}) \) is blue. We can exclude that between the arcs from \(v_{4}\) to the vertices \(v_{1}, v_{2}, v_{3} \) there are two red ones, otherwise we’ll have another red triangle. So there are two blue arcs. The same goes for the arcs \(v_{5} \) to the vertices \(v_{1}, v_{2}, v_{3} \). So, by the Dirichlet principle there must be a blue arc from \(v_{4} \) and a blue arc from \(v_{5} \), both of which go to the same vertex, among the three \(v_{1}, v_{2}, v_{3} \), forming a blue triangle.

**Exercise 4.4**

Prove that \(R (3,4) = 9\).

Hint

First we prove that \(R (3,4) \gt 8 \). For this the following figure is sufficient: it illustrates a coloring case of a graph \(K_{8} \) which does not contain a red graph \(K_{3} \) and not even a \(K_{4} \) blue.

To prove that \(R (3,4) \le 9 \) applying Ramsey’s theorem we have: \(R (3,4) \le R (3,3) + R (2,4) = 6 + 4 = 10\).

It remains to be shown that \(R (3,4) \lt 10\). Suppose that there is a red and blue coloring of \(K_{9}\) such that there is no red subgraph \(K_{3}\) and no blue subgraph \(K_{4}\). Then each vertex of the graph must be incident with \(3\) red arcs and \(5\) blue arcs, otherwise it would be possible to build a red \(K_{3}\), or a blue \(K_{4}\), contrary to the hypothesis. Let us now consider only the red subgraph, which has \(9 \) vertices, each with three incident edges; the sum of the degrees of all the vertices is therefore \(3 \cdot 9 = 27\). This is not possible because the sum of the degrees of an undirected graph must be even (see Exercise 3.2). So the initial hypothesis is wrong and we can conclude that \(R (3,4) = 9\).

The values of the Ramsey numbers are very difficult to calculate. Some of the calculated numbers are presented in the following table:

Ramsey’s theory finds interesting applications in various sectors, including number theory, algebra, information theory. In a future article we will describe Ramsey’s theory in more detail, with particular regard to applications to number theory.

^{[1]}M. Erickson – Introduction to Combinatorics (Wiley)

^{[2]}Graham, Rothschild, Spencer – Ramsey Theory (Wiley)

The post Dirichlet’s Box Principle and Ramsey Numbers appeared first on GameLudere.

]]>The post Euler and Möbius Arithmetic Functions and RSA Cryptography appeared first on GameLudere.

]]>An arithmetic function \(f\) is a function with real or complex values defined for all positive integers:

\[ f: \mathbb{N} \mapsto \mathbb{C} \]An arithmetic function \(f(n)\) is called **multiplicative** if

It’s called **completely multiplicative** if the relation holds for all pairs of positive integers \(n,m\).

**Example 1.1**

The **identity function** \(I(n)\) is so defined:

where the symbol with square brackets denotes the integer part of the fractional number. This function is clearly completely multiplicative.

On the set of arithmetic functions it’s possible to define a binary operation, called **convolution** or **Dirichlet product**. Given two arithmetic functions \(f,g\) the convolution, denoted with the symbol \(f*g\), is defined as follows:

where the summation is taken on all the divisors of \(n\).

The following properties are easily proved:

**Exercise 1.1**

Prove that the convolution \(f*g\) is a multiplicative function if the functions \(f,g\) are multiplicative.

**Exercise 1.2**

Prove that, for every arithmetic function \(f\), we have:

that is, the identity function is a neutral element with respect to convolution.

If \(n\) is a positive integer, the Euler function, denoted with \( \varphi(n)\), counts the number of positive integers between \(1\) and \(n\) which are relatively prime with \(n\). In symbols:

\[ \varphi (n) = |\{x\in \mathbb{N} \mid 1\leq x\leq n, (x,n)=1\}| \]where the symbol \(|A|\) denotes the number of elements (or cardinality) of the set \(A\).

The first values of the function are

To analyze the values assumed by the Euler function with the SageMath environment we can use the** euler_phi(n)** function.

If \(p\) is a prime number we have

\[ \varphi(p)= p-1 \]If \(n=p^{a}\), with \(a \in \mathbb{N}\), then

\[ \varphi(p^{a})= p^{a} – p^{a-1} \]To prove this it’s enough to observe that among the natural numbers included between \(1\) and \(p^{a}\), those not prime with \(p^{a}\) are only the multiples of \(p\), that is \(\{p, 2p, \cdots ,p^{a-1}p\}\) .

In the general case the following theorem holds:

**Theorem 2.1**

If \(n=p_1^{a_1}\cdots p_k^{a_k}\), then

Proof

To prove the theorem we will use the principle of inclusion-exclusion. Let us first observe that the number of positive integers less or equal to \(n\) and divisible by a positive integer \(a\) is \(\Bigl\lfloor \frac{n}{a}\Bigr\rfloor\), that is the integer part of the division; if \(a,b\) are relatively prime, the number of positive integers less than or equal to \(n\), divisible both by \(a\) and \(b\), is \(\Bigl\lfloor \frac{n}{ab}\Bigr\rfloor\), and so on. Applying the inclusion-exclusion principle we obtain the following formula:

From the formula you can easily see that for \(n \ge 3\) the Euler function is always even.

Another consequence is the following property:

**Theorem 2.2**

The function \(\varphi\) is multiplicative. That is:

**Theorem 2.3 – Euler**

Let \(n\) be a positive integer and \(a\) an integer prime with \(n\). Then the following formula applies:

As a special case we have the small Fermat Theorem:

\[ a^{p-1}\equiv 1 \pmod{p} \]Proof

Let \(\{a_{1}, \cdots a_{\varphi(n)}\}\) be a reduced residue system modulo \(n\). Then also the set \(\{ aa_{1}, \cdots aa_{\varphi(n)}\}\) is a reduced residue system modulo \(n\). Multiplying all the elements we have:

We can delete all the factors \(a_{i}\) because \((a_{i},n)=1\), and so we have the desired result.

**Exercise 2.1**

Prove that if \(m|n\) then \(\varphi(m) | \varphi(n)\) .

**Exercise 2.2**

Prove that \(\varphi(n) \le 5\) if and only if \(n \in \{1,2,3,4,5,6,8,10,12\}\).

**Theorem 2.4**

Prove that for every positive integer \(N\) there is at most a finite number of integers \(n\) such that \(\varphi(n) = N\) .

From this, prove that the function \(\varphi(n)\) tends to infinity when \(n\) tends to infinity.

Hint

Let’s fix a positive integer \(K\) and suppose that \(n > (K!)^{K}\). Then the number \(n\) is divisible by a prime number \(p >K\) , or by \(p^{K+1}\) for some prime. In the first case we have

while in the second case we have

\[ \varphi(n) \ge \varphi(p^{K+1})= p^{K}(p-1) \ge p^{K} > K \]So, in any case, if \(n> (K!)^{K}\) then \(\varphi(n) \ge K\), thus proving the theorem.

To deepen the study of the Euler function we can see ^{[1]} or ^{[2]}.

The Möbius function \(\mu(n)\) is defined as follows:

\[ \mu(n)= \begin{cases} (-1)^{k} &\mbox{if } n=\prod_{i=1}^{k} p_{i} \\ 0 &\mbox{ otherwise} \end{cases} \]The function \(\mu(n)\) therefore is equal to zero if the number is not a square-free positive integer. The following theorem directly follows from the definition.

**Theorem 3.1**

The function \(\mu(n)\) is multiplicative.

**Theorem 3.2**

If \(n \ge 1\) we have:

Proof

The theorem is true for \(n=1\). Suppose now \(n \gt 1\) and \( n=p_1^{a_1}\cdots p_k^{a_k}\) .From the definition of the function \(\mu(n)\), we have

for Newton’s binomial theorem.

**Theorem 3.3**

The proof is similar to the previous one.

**Theorem 3.4**

Proof

The proof easily follows from theorem 2.1 and the properties of the Möbius function.

**Theorem 3.5**

If \(f(n)\) is a multiplicative function, so is the function

Proof

If \((n,m)=1,\ d|n,\ D|m\), then \((d,D)=1\) and the product \(c=dD\) takes the values of all divisors of \(nm\). Then

**Theorem 4.1**

Let \(f,g \colon \mathbb{N} \to \mathbb{R}\) be two arithmetic functions. If

then the following inversion formula applies:

\[ g(n)=\sum_{d|n} \mu(d) f\left(\frac{n}{d}\right). \]Proof

\[ \begin{split} &\sum_{d|n} \mu(d) f\left(\frac{n}{d}\right)= \sum_{d|n} \mu(d)\sum_{c|\frac{n}{d}}g(c)=\sum_{cd|n} \mu(d)g(c)= \\ &\sum_{c|n} g(c) \sum_{d|\frac{n}{c}}\mu(d) \end{split} \]The internal sum is equal to \(1\) if \( \frac{n}{c}=1\) , otherwise it’s equal to \(0\). So the expression is equal to \(f(n)\).

The following inverse theorem is also valid:

**Theorem 4.2**

If \(n=p_1^{a_1}\cdots p_k^{a_k}\) and

then

\[ f(n)=\sum_{d|n}g(d) \hspace{5 mm} \forall n\geq 1 \]The proof is similar to the previous one.

Putting together theorem 3.4 and theorem 4.2 we immediately obtain the following formula:

**Theorem 4.3**

**Exercise 4.1**

If \(n=p_1^{a_1}\cdots p_k^{a_k}\) prove the following formula:

**Exercise 4.2**

Prove the following formula:

**Exercise 4.3**

If \(f(n)\) is a multiplicative function, prove that

**Exercise 4.4**

The Euler function has several connections with the Riemann zeta function. In particular, an important relation with the Riemann hypothesis on the zeros of the zeta function is given by the following theorem.

**Theorem 5.1 – Jean-Louis Nicolas**

If the Riemann hypothesis is true, then the following relation holds:

where \( \gamma \approx 0.577\) is the Euler-Mascheroni constant and \( N_{k}= \prod_ {i=1}^{k} p_{i}= 2 \cdot 3 \cdot 5 \cdots p_{k}\) is the **primorial** of order \(k\) , that is the product of the first \(k\) prime numbers (the name is analogous to the factorial).

Conversely, if the Riemann hypothesis is false, then the relation described above is valid for infinite other values of \(k\) and is not verified for infinite values of \(k\) .

So a possible strategy to prove Riemann’s hypothesis is to prove the validity of the relation from a certain positive integer \(k\) onwards.

To learn more about Nicolas’s criterion, see ^{[3]}. To study the Riemann zeta function an excellent text is ^{[4]}.

The RSA algorithm, introduced in 1978 by **Rivest**, **Shamir** and **Adleman**, is currently the most widely used public and private key cryptography system. It can be used both to encrypt data and to manage digital signatures. It’s considered safe, at present, if fairly long keys (at least 1024 bits) are used. Its security is based on the difficulty of factorizing very large integers.

Before describing the RSA algorithm it is useful to remember the following theorem, which plays an important role in proving the correctness of the algorithm itself.

**Theorem 6.1 – Chinese remainder theorem**Consider the system of two linear equations to congruences:

If \( m_{1},m_{2}\) are two relatively prime positive integers and \(a_{1},a_{2}\) are integers, then the system has a solution \(\pmod {m_{1}m_{2}}\). If \(x_{0}\) is a solution, then so are the integers \(x=x_{0} + k m_{1}m_{2},\ k \in \mathbb{Z}\).

The theorem can be easily extended to the case of a system of \(n\) equations, with \(n \gt 2\).

For a proof see one of the texts referenced in the bibliography.

**Exercise 6.1**

Solve the following system:

Answer: \(x= 24 + 35k\)

The basic scheme of the RSA procedure is as follows:

- given two subjects, A and B, each chooses a pair of very large prime numbers, which are kept secret
- subject A uses his pair of primes \( p_{a},q_{a}\) and compute \(n_{a}=p_{a} q_{a}\). We note that \(\varphi(n_{a})=(p_{a}-1)(q_{a}-1)\).
- using a random number generation algorithm, a number \(e_{a}\) is computed such that \(1 < e_{a} < \varphi(n_{a})\) and that is also prime with \(\varphi (n_{a})\).
- then compute \(d_{a}\), the inverse of \(e_{a}\) modulo \(\varphi(n_{a})\).

Ultimately, the numbers available to subject A are as follows:

\[ \begin{cases} n_{a}=p_{a} q_{a} \\ e_{a} \quad \text{ with } \quad (e_{a},\varphi(n_{a}))=1 \\ d_{a}= e_{a}^{-1} \mod {\varphi(n_{a})} \end{cases} \]The same procedure is performed by the subject B.

The** public key **of subject A is the pair of numbers

which is made public and accessible to the outside world. The **private key** of subject A is the pair

The public key of subject B is the pair of numbers

\[ K(E,B) = (n_{b},e_{b}) \]which is made public and accessible to the outside world. The private key is the pair

\[ K(D,B) = (n_{b},d_{b}) \]The following diagram summarizes the operations of the RSA procedure:

The message to be encrypted is represented with an integer \(m \lt n\). If the integer is greater than \(n\), the message is decomposed into a set of smaller separate messages. Another condition is that the integer \(m\) must be prime with \(n\). However, this is an unlikely event as if \(n=pq\) there are only \(p+q-1\) smaller integers that are not prime with \(n\) , that is the numbers \(1,p,2p, \cdots (q-1)p,q,2q, \cdots (p-1)q\) ; the probability is:

\[ \frac{p+q-1}{pq} \approx 1/p+1/q \]If the primes \(p,q\) are very large, this probability is very small. So, to encrypt a message \(m\) that A sends to B, the public key of B is used through the following function:

\[ c \equiv m^{e_{b}}\pmod {n_{b}} \]To decrypt the encrypted message \(c\), subject B uses the inverse function with his private key.

**Theorem 6.2**

Proof

In the proof, we will skip the index \(b\) for simplicity.

We have applied the Euler theorem, thanks to the hypothesis that \(m\) is prime with \(n\).

**Exercise 6.2 **

We apply the algorithm in the following case:

We choose e=3 and compute \(ed \equiv 1 \pmod {20}\) . It’s easily found that \(d=7\). So, the public key is \(K_{3}= (33,3)\) while the private key is \(K_{7} = (33,7)\).

Now, to encrypt a plain message \(P=5\), we compute \(C \equiv 5^{3} \pmod{33} = 26\). To do the reverse operation and obtain the unencrypted message, we compute \(P \equiv 26^{7} \pmod {33} =5\).

The RSA procedure described above has some security problems. A good encryption system must guarantee, in addition to data security and confidentiality, also the authenticity of the user who transmits the message. This can be achieved through a digital signature procedure that uses the RSA protocol.

Without going into details, we add another small message, encrypted with the sender’s private key, to the message encrypted with the recipient’s public key. The receiver, before decoding the message with his private key, performs an authenticity verification operation using the sender’s public key. If the verification is successful, it then proceeds to decrypt the message with its own private key.

The following diagram illustrates the main operations:

The security of the RSA system is based on the difficulty of computing the prime factors of very large numbers. In the period 1993-1994, using about 600 computers for 6 months, it was possible to decrypt the RSA code that used \(129\)-digit numbers. This forced to increase the length of the keys.

In the current state of mathematics the factoring of numbers of a thousand digits is practically impossible, except in some very specific cases. However, the progress of mathematics and the availability of more and more powerful computers used in parallel make it clear that the RSA procedure will have to use even larger numbers to ensure security. For an in-depth study of the mathematics of cryptographic schemes see ^{[5]}.

**Exercise 9.1 **

Prove that \(\varphi(n) \ge \frac{\sqrt{n}}{2}\).

Hint

Use the following inequalities:

**Exercise 9.2**

Let \(n\) be a perfect even number. Show that the integer \(n-\varphi(n)\) is the square of an integer.

Hint

Remember that a perfect even number is an integer of the form \(n=2^{p-1}(2^{p}-1)\), where \(p\) is a prime number and the number \((2^{p}-1)\) is prime too.

**Exercise 9.3**

Find infinite integer numbers \(n\) such that \(100 | \varphi(n)\).

**Exercise 9.4**

Characterize the positive integers \(n\) for which the relationship applies:

**Exercise 9.5**

If \((a,b)=d\), then

**Exercise 9.6**

We define the **Jordan function** \(J_{k}(n)\) , which is a generalization of the Euler function \( J_{1}(n)=\varphi(n) \):

For each integer \( k \ge 1\) , prove that \(J_{k}(n)\) is equal to the number of \((k + 1)\)-uples of integers \(\{x_{1}, \cdots x_{k},n\}\) whose maximum common divisor is equal to \(1\).

^{[1]}Niven, Zuckerman, Montgomery – An introduction to the Theory of Numbers (Wiley)

^{[2]}Hardy, Wright – An introduction to the Theory of Numbers, V edition (Oxford, 1979)

^{[3]}Jean-Louis Nicolas – Small values of the Euler function and the Riemann hypothesis (ACTA ARITHMETICA 155.3, 2012)

^{[4]}
H. Edwards – Riemann’s zeta Function (Dover)

^{[5]}
N. Koblitz – A Course in Number Theory and Cryptography (Springer)

The post Euler and Möbius Arithmetic Functions and RSA Cryptography appeared first on GameLudere.

]]>The post Cardano, Gambling and the dawn of Probability Theory appeared first on GameLudere.

]]>However, the first studies on the calculation of probabilities appeared already a century earlier, in particular in the work of

In this article we will describe some gambling problems studied by Cardano and other scholars of the period, that introduce the basic concepts of classical probability, later defined more precisely by Pascal, Fermat, Huygens and others.

The development of the calculus of probability started with the need to solve practical problems related to gambling. The word derives from the Arabic term **al zahr**, which means die. Gambling typically involves wagering money or other personal values on the basis of a future random event.

The passion for gambling is as old as humanity itself. In places such as China, Egypt, Greece, Rome, there are evidences that date back to thousands of years ago. The most used tool is represented by** dice **(**astragals** were used in the distant past). Playing **cards** appeared in the late 14th century.

The first **casinos** opened in Italy in the seventeenth century, for example the Ridotto of Venice in 1638, then spread throughout Europe. In the sixteenth century the game of the** lotto **was born in Genoa. At the beginning of the twentieth century the first mechanical slot machines were introduced, later perfected as electromechanical.

With the advent of the Internet, slot machines have become accessible online to anyone, allowing people to play from home. Thanks to the diffusion of smartphones, **mobile gambling** is constantly increasing.

In all ages the passion of gambling has conditioned the lives of many people for better or for worse. It has often led to the ruin of many players and for these reasons, in various countries, attempts have been made over time to ban it, without success.

A very interesting reading to understand player psychology is the famous book by the great Russian writer F. Dostoevskij ^{[1]}.

As we said earlier, the first studies on probability started to solve problems posed by gamblers. One of the main scholars is certainly the Italian mathematician **Gerolamo Cardano** (1501-1576).

Cardano is a typical Renaissance man who was interested in various sciences: mathematics, physics and mechanics, medicine, astrology, alchemy. Among other things, he was one of the first scientists to affirm the impossibility of perpetual motion, that is the creation of a machine that can operate without any loss of energy.

Cardano was a passionate gambler; from his memoirs it appears that for many years of his life he played almost every day all kinds of games of his time: dice, chess, cards, and so on.

As regards mathematics, Cardano played a primary role in the study of third and fourth degree algebraic equations. The general third degree equation can be written as follows:

\[ x^{3}+ Ax^{2}+Bx +C=0 \\ \]Solving third-degree equations posed a challenge for Renaissance mathematicians. The mathematician Nicolò Tartaglia (1499-1557) found a method, in 1534, to solve the reduced third degree equations, that is without the second degree term:

\[ x^{3}+bx+c=0 \\ \]The general third degree equation can be put into the reduced form by substituting \(x = y – \frac {A} {3} \).

Cardano convinced Tartaglia to reveal to him the secret of the solution formula and after much resistance Tartaglia revealed his method, obliging him not to publish it with an oath.

In 1545 Cardano published his important work ‘**Ars Magna**‘ in Germany. The publication of this book represents for many the beginning of modern mathematics. In the book Cardano presented among other things the solutions of the general equations of the third and fourth degree and also the way to pass from the general equation to the reduced one. The fact that he published Tartaglia’s formula in his work caused a dispute between the two mathematicians.

However, several years had passed since the promise made to Tartaglia and during this period the latter had never wanted to publish his formula. Furthermore, together with Cardano and Tartaglia, there were two other mathematicians who made a fundamental contribution to solving the two problems in the general form: **Scipione del Ferro** (1465-1526) who actually discovered the solution method before Tartaglia, and **Ludovico Ferrari **(1522-1565), a pupil of Cardano who discovered the formula for fourth degree equations. In his ‘Ars Magna’ Cardano correctly recognized the merits of all.

The general formula for solving the reduced third degree equation is as follows:

\[ \displaystyle x=\sqrt[3]{-\frac{c}{2}+ \sqrt{\frac{c^{2}}{4}+ \frac{b^{3}}{27}}} + \sqrt[3]{-\frac{c}{2}- \sqrt{\frac{c^{2}}{4} + \frac{b^{3}}{27}}} \\ \]**Exercise 2.1**

Calculate the solutions of the equation \(x^{3} = 6x + 6 \).

However, a fundamental problem remained to be solved: the so-called **irreducible case** could not be treated, that is when

If the size in the square roots is negative, then we are in the presence of the so-called imaginary numbers, which had not yet been introduced into mathematics. The situation will be clarified later by the mathematician **Raffaele Bombelli** (1526-1573) who laid the foundations for the theory of complex numbers.

**Exercise 2.2 **

Calculate the three real solutions of the equation:

Solution: \(4, \sqrt {3} +2, \sqrt {3} -2 \)

Cardano wrote a book called ‘**Liber de Ludo Aleae**‘ in which the first concepts of classical probability are introduced and discussed. Unfortunately, the booklet was not printed before 1663 and therefore did not have the attention it deserved. For the English version of the book see ^{[2]}.

Cardano’s book deals in particular with the dice game, in which he himself had alternate fortunes of winnings and losses. In his booklet Cardano introduces the concept of **circuit**, that is the set of all possible cases, which coincides with the modern space of samples or space of events. Despite some initial errors, Cardano basically introduces the classical definition of probability as a ratio between the number of favorable and possible cases. However, in the initial part of the book Cardano also uses a second type of reasoning, called reasoning on the mean, which leads him to incorrect results.

**Example 2.1 – Reasoning on the mean**

By rolling a die, the probability that a single face appears, for example the number \(4 \), is equal to \(p = \frac {1} {6} \). So with a single throw we get \(\frac {1} {6} \) as the probability of winning a score. If we roll the die twice we get \(2 \cdot \frac {1} {6} \), and so on. In particular, if we make three rolls we would have the \(50 \%\) chance of a given face coming out.

If we accept this type of reasoning, \(6 \) throws would be enough to be sure that a given face comes out, while in reality a given face may never come out even after a very large number of throws.

Later Cardano became aware of the fallacy of the reasoning and from then on he used the correct method, which consisted of counting favorable and total cases.

In chapter \(XIV\) Cardano clearly sets out the rule to be applied to calculate the probability of an event. Counting the total cases of the circuit, then counting the favorable cases and calculating the probability according to the classic formula:

Cardano actually uses the terms of gambling to express his general rule: for each player it’s important to know the relationship between favorable and unfavorable cases. If an event \(A \) has probability equal to \(P (A) \), then the odds in favor and against are as follows:

\[ \displaystyle \begin{array}{l} \textbf{odds in favor of A = }\dfrac{P(A)}{1-P(A)} \\ \\ \textbf{odds against A = }\dfrac{1-P(A)}{P(A)} \end{array} \\ \]**Problem 2.1 – Cardan**

Calculate how many times you must roll a fair die to have a probability greater than \(\frac {1} {2} \) that at least one \(6 \) comes out?

Solution

We solve with modern notation. We denote with \(A \) the event that is true if the number \(6 \) comes out. It’s also convenient to calculate the probability of the complementary event \(\overline {A} \), that is the event that is true if the number \(6 \) does not come out. In a launch we have \(P(\overline {A}) = \frac {5} {6} \) and therefore \(P (A) = 1 – \frac{5} {6} \). If we make two throws, since the throws are independent, we have \(P (\overline {A}) = \left (\frac {5} {6} \right)^{2} \) and therefore \(P (A ) = 1 – \left (\frac {5} {6} \right)^{2} \). So in the case of \(n \) throws we have the following probability values:

To solve Cardano’s problem we have to find the value of \(n\) such that \(P (A) \ge \frac {1} {2} \), or equivalently \(P (\overline {A} ) \lt \frac {1} {2} \). Since

\[ \begin{array}{l} \left (\frac{5} {6} \right)^{3} \approx 0,578 \\ \left (\frac {5} {6} \right)^{4} \approx 0,482 \\ \end{array} \]we can conclude that \(4 \) throws are enough for it to be advantageous to bet on the exit of the number \(6 \).

Cardano initially uses reasoning on the average, and erroneously considers that the number of throws is equal to \(3 \), as he calculates

But then, realizing the error, he correctly calculates that the number of favorable cases for a face to come out with three throws is \(91 \) and not \(108 \), so four throws are needed to have a probability greater than \(50 \% \) that a given face comes out.

In his book Cardano presents several examples relating to the throwing of two dice and three dice, correctly carrying out the calculations according to the classic definition of probability.

Another type of problem studied by Cardano is the calculation of the probabilities for a repeated event. As we saw, in a three-dice roll the odds for and against getting at least a \(4\) are in the proportion of \(91 \) and \(125 \). Cardano tries to calculate the probability of success in each of two successive throws; he initially uses a wrong reasoning and assumes that the odds for and against are in the proportion \(91^{2} \) and \(125^{2} \). And so on for more trials. Cardano realizes that this method of calculation leads to absurd results. After several attempts he comes to prove the correct formula; in a set of \(2 \) independent successive tests the odds for and against obtaining two consecutive successes are in the following proportion:

\[ 91^{2} \quad \text{e} \quad 216^{2}-91^{2} \]discovering one of the fundamental laws of the calculation of probabilities. In modern form we can affirm that the probability \(p_{n} \) of obtaining \(n \) successes in \(n \) independent tests is given by the following formula:

\[ p_{n}=p^{n} \]where \(p \) is the probability of success in a single test. This is an important result, which will be generalized with the Bernoulli distribution, to calculate the probability of having \(k \) successes in \(n \) throws.

In the last chapter of his libretto Cardano exposes, albeit still in a rudimentary way, the content of the law of large numbers and the average of a random variable. By repeating an experiment \(n \) times, a probability event \(p \) will occur on average a number of times equal to \(n \cdot p \).

We describe some typical exercises of the time. From the current point of view they may seem simple but, considered in the context of the initial stage of the development of the calculation of probabilities, they have a great importance.

Suppose we roll three dice and add the three numbers obtained. The total scores of \(\{9,10,11,12 \} \) can all be obtained each with \(6 \) different combinations. Why are the total scores of \(10 \) or \(11\) more likely than the scores of \(9 \) or \(12 \) then?

This problem was posed by the Grand Duke of Tuscany to **Galileo Galilei **(1564-1642), who presented a solution in his treatise ‘**Above the dice discoveries**‘.

With three dice the total number of elementary events, i.e. triplets of numbers, is \(6^{3} = 216 \). However the sum of the three numbers can only take \(16 \) distinct values: \(\{3,4,\cdots, 18 \} \). The following table shows the possible combinations for the four cases:

The last line contains the number of dispositions ordered in the various cases, which as we can see are not all the same in the \(4 \) cases. Galileo then counts the number of favorable cases and divides by the number of total cases, according to the classical definition of probability. The same type of problem had actually already been solved by Cardano and is present in chapter 13 of his book.

In 1654 Antoine Gombaud, known as Chevalier de Méré, asked Pascal the following problem concerning two ways of betting with dice:

- in the first game a single die is used and it is wagered to get at least one \(6\) after \(4 \) consecutive throws;
- in the second type we use \(2 \) dice and we bet on the combined outcome of two \(6 \) in \(24 \) consecutive throws.

Méré used an incorrect reasoning, similar to Cardano’s initial one: in the first case according to his calculations the probability is \(\frac{1} {6} \cdot 4 = \frac{2} {3} \). In the second case it results \(\frac{1} {36} \cdot 24 = \frac {2} {3} \), equal to the first type of game. Although the odds were the same according to his calculations, he lost a lot of money by betting on the exit of the pair of \(6 \) in \(24 \) throws, while generally he won in the first type of game. For this reason he submitted the problem to the great mathematician Pascal.

Pascal gave the correct answer, concluding that the pair of \(6\) on \(24 \) throws is an event with less probability than a single \(6\) on \(4 \) throws.

Let’s see the solution using modern notation, solving the two types of games separately.

In the first game the total cases are: \(6^{4} = 1296 \); the unfavorable cases are \(5^{4} = 625 \); the favorable cases are \(1296-625 = 671\). So the first type of game is favorable to the bettor.

In the second game the total cases are: \(36^{24} \), the unfavorable cases are \(35^{24}\) and the favorable cases are \(36^{24} -35^{24} \). These are very large numbers; however, it can be seen that the number of unfavorable cases this time is greater than the favorable ones. In terms of probability, the probability of winning in the first type of game is:

The first game is therefore not fair, but favorable to the bettor: on \(100 \) bets he would win on average \(52 \) times. The probability of winning in the second type of game is instead:

\[ P(A_{2}) = 1 – \left(\frac{35}{36}\right)^{24} \approx 0,4914 \]confirming that the second game is less favorable than the first. The second game is also not fair, but this time it is unfavorable to the bettor: on \(100 \) bets he would win on average \(49 \) times.

Two players A and B are giving away a certain stake that will be given to the first of the two who will reach a fixed number \(N \) of points, in a sequence of independent tests. However, the game is stopped when player A has gained \(a \) points and player B has scored \(b \) points. The question is: how should the total stakes be divided?

In the initial version, the two players are assumed to have the same chance of winning every single point. The substance of the problem consists in calculating the probability of winning of each of the two players at the time of interruption, according to the points earned by each. In this article we will present some incorrect solution attempts proposed by the first probability scholars of the time. In a subsequent article we will present the correct and complete solutions given by Pascal and Fermat.

The mathematician Luca Pacioli (1447-1517) exposed the problem in his important work ‘**Summa de arithmetic, geometry, proportions and proportionality (1494)**‘. Pacioli was also one of the first to publish a description of the double-entry system used by book-keepers and accountants.

He considered a particular version of the problem of the points: A and B play a fair game that will be completed when one of the two wins \(6 \) games. The game is stopped when player A has won \(5 \) games and B has won \(3 \). How should the wager be distributed? Pacioli’s solution consists in dividing the stakes in proportion to the points obtained by the two players. So A receives \(\dfrac{5}{5 + 3} = \dfrac{5}{8} \) and B receives \(\dfrac{3}{5 + 3} = \dfrac{3}{8}\) of total stake.

Tartaglia criticized Pacioli’s solution in his ‘**General treatise on numbers and measures**‘. Tartaglia pointed out that, according to Pacioli’s rule, if A wins one game and B none, then player A should take all the stakes, which is obviously not correct. Tartaglia understood that to solve the problem we must not consider the points already obtained, but we must calculate the probabilities that the two players have to earn the remaining points. Tartaglia’s method consists in paying the player who has the advantage of his initial stake plus a fraction obtained by dividing the difference of the points obtained by the total score required to complete the game.

We apply Tartaglia’s method to the following example: A and B play a ball game which is completed when one of the two wins \(60 \) games. Everyone invests \(22\) euros. The game is stopped when player A has already won \(50\) games and B has won \(30\). How should the wager be distributed? Tartaglia’s method involves the following calculations:

However, Tartaglia’s method is also incorrect.

Cardano also understood that the solution to the problem of the division of the stakes depends on the number of missing games and not of the ones played. Cardano proposes to divide the stakes according to the ratio of two arithmetic progressions, defined according to the points missing by the two players:

\[ \displaystyle \begin{array}{l} 1+ 2 + \cdots + (n-a) = \dfrac{(n-a)(n-a+1)}{2} \\ 1+ 2 + \cdots + (n-b) = \dfrac{(n-b)(n-b+1)}{2} \\ \end{array} \]So the stakes should be divided according to the relationship between the progressions of \((n-a) \) and \((n-b) \), or

\[ (n−b)(n−b+1) : (n−a)(n−a+1) \]Applying this rule to the first type of Pacioli’s problem \((n = 6; \ a = 5; \ b = 3) \), we find that the stakes should be paid to the players in the ratio \((6: 1) \). Cardano’s logic is based on the idea that the leading player must receive compensation proportional to the effort that the other player must make to win the game. The progressions based on the missing points are, according to Cardano, a measure of the relative effort of the two players to obtain the number of points expected by the game. Cardano’s solution is also incorrect, although the setting is fundamentally correct.

The correct solution was subsequently given by Pascal and Fermat, thanks also to the new tools made available by the developments of the Combinatorial Calculus. We will describe Pascal and Fermat’s solutions in a later article.

As we have seen, Cardano made important contributions to the birth of the Calculus of Probabilities, even if his analysis on several occasions were too simplistic or even incorrect. After him Pascal, Fermat and then Huygens will lay the foundations for a rigorous definition of the concept of classical probability and the proof of the main theorems.

^{[1]}F. Dostoevsky – The Gambler and Other Stories (Penguin Classics)

^{[2]}G. Cardano – The Book on Games of Chance (Dover)

The post Cardano, Gambling and the dawn of Probability Theory appeared first on GameLudere.

]]>The post Exercises in Elementary Number Theory (III) appeared first on GameLudere.

]]>The Fermat numbers are defined as follows:

\[ F_{n} = 2^{2^{n}} + 1 \quad n =0,1,2,\cdots \]Prove that all Fermat numbers with \(n \gt 1 \) have the last digit equal to \(7\).

Hint

The number \(2^{2^{2}} = 16 \) ends with the digit \(6 \). The same is therefore true for the numbers \(2^{2^{n}} \) with \(n \gt 2\).

For the properties of Fermat numbers you can see the article on this blog.

Prove that a necessary condition for a number of the form \(n^{n} +1\) to be prime is that \(n = 2^{2^{r}} \).

Hint

Suppose \(n = 2^{t} m \), where \(m \) is odd. If it were \(m \gt 1 \) then \((n^{2^{t}})^{m} +1 \) would be composite, due to the following formula:

If \(t = 0 \) we are done. If \(t \gt 0 \) then \(t = 2^{r}s \), where \(s \) is odd. By reasoning as above, we can deduce that it must be \(s = 1 \).

Prove the following identity:

\[ \left(1-\frac{1}{2^2}\right)\left(1-\frac{1}{3^2}\right) \cdots \left(1-\frac{1}{n^2}\right)=\frac{n+1}{2n} \]Prove the following formula:

\[ 2 \cos \left(\frac{\pi}{2^{n+1}}\right)=\sqrt{2+\sqrt{2+ \cdots \sqrt{2}}} \]where the number of square roots is equal to \(n \).

Prove that if an integer \(n \) is of the form \(n = 4k + 3,\ k \in \mathbb {N} \), then \(n \) has at least one prime factor of the same form.

Hint

The odd primes are all of the form \(4k + 1 \) or \(4k + 3 \). Also note that the product of two numbers of the form \(4k + 1 \) is also of the same form.

Prove that if \(n \) is an odd positive integer then:

\[ \binom{n}{1}-5\binom{n}{2}+5^{2}\binom{n}{3}- \cdots + 5^{n-1}\binom{n}{n} =\frac{1}{5}(1+4^{n}) \]Prove that the last non-zero digit of \(n! \) is always even, if \(n \gt 2 \).

Hint

We can write \(n! = 2^{r} 5^{s} m \), where \((m, 10) = 1 \). Then we observe that it must always be \(r> s \).

As a further consequence, we can say that the maximum power of \(10 \) which divides \(n! \) is \(10 ^{s} \).

Let \(d (n) \) be the arithmetic function that counts the number of positive divisors of \(n \). Prove the following formula:

\[ \sum_{k|n}d^{3}(k)=\left(\sum_{k|n}d(k)\right)^{2} \]For the properties of the \(d (n) \) function, you can see the article in this blog mentioned in the first exercise. See also the article concerning arithmetic functions and in particular Dirichlet product.

Hint

Prove first that both the left and right functions are multiplicative. Then prove the formula for the case \(n = p^{a} \). Also remember the following formula:

Prove that an integer of the form \(4k + 3 \) cannot be the sum of two squares.

Hint

The square of an integer is congruent to \(0 \) or \(1 \) modulo \(4 \).

Prove that the Fermat number \(F_{5} = 2^{2^{5}} + 1 \) is divisible by \(641 \).

Hint

Use the relationship \(5 \cdot 2^{7} \equiv -1 \pmod {641} \). Raise to the fourth power and note that \(5 ^ {4} \equiv -16 \pmod {641} \).

The Fermat numbers

\[ F_{0} = 3, F_{1} = 5, F_{2} = 17, F_{3} = 257, F_{4} = 65537 \]are all prime.

Fermat made the conjecture, in 1650, that all the Fermat numbers are prime. However, no prime Fermat numbers have been found for \(n \ge 5 \) to date. The computed ones are all composite.

The post Exercises in Elementary Number Theory (III) appeared first on GameLudere.

]]>