Solving systems of equations
Solving a single linear equation with one unknown quantity is a task that is studied in elementary school. One proceeds in high school to study more complex equations involving more involved functions. Many problems in mathematics and its scientific applications amount to solving equations, and especially to solving systems of equations.
A system of equations is a number of equations that must be solved simultaneously. This means that one must find a value for each unknown that together result in a solution for each equation. For example, consider this system of linear equations:
$$x + y = 5$$
$$x \; – y = 1$$
Using $x = 4$ and $y = 1$ in the first equation and using $x = 8$ and $y = 7$ in the second equation is not considered a solution to this system of equations. The same values of $x$ must be used in each equation, and the same values of $y$ must be used in each equation; this is the meaning of simultaneous in this context. You can check for yourself that the solution to the system of equations shown above is $x = 3$ and $y = 2$.
One aspect of linear algebra is developing methods for the systematic solution of systems of linear equations. The system shown above might be solved by a little bit of guesswork, but for practical problems in mathematics, science, and engineering, one might face having to solve a system consisting of thousands of equations including thousands of unknowns. The only practical means for doing so is to have computers solve such problems, but some human has to write an efficient algorithm to program the computer, and this is possible only if said human has an excellent understanding of linear algebra.
Linear and nonlinear theories of physics
So far we have discussed that linear algebra is concerned with the solution of systems of linear equations. This might seem quite restrictive, because there are so many other types of equations. But it is perhaps surprising how often systems of linear equations arise in important applied problems. One of the reasons for this is that many of the fundamental theories of physics (which are then applied in engineering) are linear. What does it mean for a theory of physics to be linear?
The field concept in physics
To answer this, let’s first discuss the idea of a field. This word has several meanings in both mathematics and physics, but we’ll discuss here the physical concept of a field. Consider the temperature of the air at each point in the room that you are in. At every position in the space enclosed by the room, there is a number that describes the temperature of the air at that position. We could therefore speak of the air-temperature function in the room. Instead of calling this a function, you could also call the air-temperature distribution in the room a field. Thus, a field in physics is a function of a certain kind.
For another example, consider the wind velocity at various points within a wind tunnel. At every point in the space enclosed by the wind tunnel, there is a vector that describes the velocity of the air at that position. We could therefore speak of the air-velocity function in the wind tunnel. Instead of calling this a function, you could also call the air-velocity distribution within the wind tunnel a field. Again, a field in physics is a sort of function.
The temperature in a room can be specified by a single real number at each point in the room, so the air-temperature distribution is called a scalar field. On the other hand, the wind-velocity distribution is called a vector field. In advanced work, you will learn about other kinds of fields, such as tensor fields, spinor fields, and so on.
A typical physical theory includes what are called field equations, which are differential equations describing how field quantities change in time and how they change as one moves around from one position to another in space. For example, there are electric and magnetic fields, and Maxwell’s equations describe how electric and magnetic fields change in space and time. Maxwell’s equations underpin our current best understandings of electric and magnetic phenomena in the macroscopic world. (When you get to the atomic and sub-atomic realms, our best understandings come from a quantum theory of electrodynamics (QED).)
There are gravitational fields as well. In a modern formulation of Newton’s theory of gravity, there is a gravitational field equation, the solution of which leads to Newton’s law of gravity, which you will have studied in high-school physics classes. However, you typically don’t learn about the field equation for Newtonian gravity until well into an undergraduate degree in physics.
Newton’s theory of gravity is said to be as a linear theory in the following sense. Suppose you have a single object in space, such as a planet. Solving the field equation for Newtonian gravity, you will obtain a formula for the gravitational field surrounding this planet. Suppose you went through the same procedure for a second hypothetical planet, independent of your first calculation. Now consider a third problem, where you have to determine the gravitational field in a region of space that contains both planets. Because Newtonian gravity is a linear theory, you don’t have to solve the gravitational field equations a third time; you just add together the solutions of the first two problems and you will immediately have the solution to the third problem.
In physics literature, the concept of linearity discussed in the previous paragraph is called the principle of superposition. In this context we have two different terms (linearity and the principle of superposition) that refer to the same concept.
The word linear has numerous meanings in mathematics, depending on context, but some of these meanings are related. For example, a differential equation is said to be linear if it admits the principle of superposition. (That is, given any two solutions to the differential equation, the sum of the two solutions is also a valid solution to the differential equation.) Similarly, a system of differential equations is said to be linear if it admits the principle of superposition.
This is a different sense of the word linear than the one used at the beginning of this article, when we discussed the equations
$$x + y = 5$$
$$x \; – y = 1$$
Each of these equations is said to be linear because the graph of the collection of points that satisfies each is a line. Similarly, the graph of a linear equation in three unknowns is a plane, and so on. This definition of linear is not quite the same as the definition of linear given in the context of differential equations, but both types of linearity are subjects considered in linear algebra.
You may have heard about Einstein’s theory of gravity, also known as general relativity. General relativity does not have a principle of superposition; its field equations are nonlinear. This means that in Einstein’s theory of gravity if you wished to calculate the gravitational field in the vicinity of, say, two black holes, you couldn’t benefit from the simplification of solving the problem for each black hole independently and then just adding up the solutions. You have to start from scratch to determine the gravitational field near two black holes. And solving Einstein’s gravitational field equations is a formidable problem. This makes Einstein’s theory of gravity much harder to work with from a technical perspective than Newton’s theory of gravity. This explains why Newton’s theory is preferred in situations where it is a sufficiently good approximation.
Einstein’s theory of gravity is the best available theory of gravity. It makes the most accurate predictions, beautifully confirmed by uncountable experiments. In fact, GPS works correctly only because it has been programmed making use of Einstein’s theory of gravity, not Newton’s theory of gravity. (Einstein’s theory of special relativity is also used to program GPS, but the full story will have to wait for another time.) So every time you use GPS you are in effect conducting an experiment that confirms Einstein’s theories.
Linear approximations in nonlinear situations
Although Einstein’s theory of gravity is the best available theory of gravity, it is technically much more difficult to work with than Newton’s theory of gravity. In many situations, Newton’s theory of gravity is an excellent approximation, so unless high-precision work is done (such as programming GPS or exploring the nearby surroundings of black holes, neutron stars, and the like), Newton’s theory is used for preference. If you want to send spacecraft to the Moon, or some other planet in our solar system, Newton’s theory of gravity works just fine. If you want to understand tides on Earth, or why the Moon’s orbital period is the same as its rotational period, Newton’s theory of gravity works just fine.
To recap this part of our discussion: Einstein’s theory of gravity is the best theory of gravity currently available, but it is very hard to work with, because it is nonlinear. Newton’s theory of gravity is an excellent approximation, in some circumstances, and it is much easier to work with, because it is linear.
This theme of using linear approximations in nonlinear situations is commonplace in mathematics. For an engineer, applied mathematician, or scientific modeller, this is the typical pattern. The first thing you do when confronted with a complicated nonlinear situation is to approximate the situation with a linear model. You then solve the linear model and use it to learn as much as you can about your situation. After comparing the conclusions to reality, and extracting as much understanding as you can, you then try to systematically make your approximation better.
This is another reason why linear algebra is so important. Even though many situations are nonlinear, our first crack at understanding them is typically to use a linear model. Such linear models are amenable to the tools and techniques of linear algebra.
Linear approximations are fundamental in calculus too. Consider a function f and a point A that is on the graph of f. Suppose that the graph of f is smooth at A. The basic idea of differential calculus is that the best linear approximation to f at A is a line through A that has slope equal to the derivative of f evaluated at A. The graph of this linear approximation is called the line tangent to the graph of f at the point A. If the graph of f is a curve, then for many purposes you can work with the simpler tangent line instead of the more complicated curve.
Linear algebra is one of the foundation pillars of modern mathematics, and in applying it to differential equations you can see calculus and linear algebra work together powerfully. You will really enjoy this when we get to that point, but you will have to learn about calculus and linear algebra separately first.
Linear transformations
As you dig deeper into linear algebra, you will learn about linear transformations. Some of the geometrical transformations that you studied in elementary school, such as reflections, rotations, inversions, and dilatations, are examples of linear transformations.
It turns out that the question of whether a system of linear equations has a solution or not, and how many solutions, can be treated from the perspective of linear transformations. In two or three dimensions, this can be understood and visualized geometrically, as we shall see later in our linear algebra studies. This is also typical in applications of mathematics. It is often simpler at first to understand a concept geometrically, and use this to reason by analogy in more complicated situations.
In the example given earlier,
$$x + y = 5$$
$$x \; – y = 1$$
we can temporarily replace the system of equations by a more general system:
$$x + y = X$$
$$x \; – y = Y$$
Now we can consider this more general system of equations as a kind of machine, where you input the coordinates of a point $(x, y)$ on the left sides of the equations, and the output of the machine, represented by the right sides of the equations, $(X, Y)$, also represents a point in the plane. For example, you can check for yourself that when the input to the machine is $(3, -1)$, then the output of the machine is $(2, 4)$.
It turns out that this machine is a linear transformation of the plane. You can study the geometry of this transformation by substituting the coordinates of some domain points into the machine and calculating the corresponding range points. Have fun playing with this! (We’ll study linear transformations and their geometry later.) Then the question of whether the original system of equations has a solution is equivalent to the question whether $(5, 1)$ is in the range of this linear transformation.
Linear transformations have lots of other applications, too, as we shall see.
Modelling spaces
We appear to live in a three-dimensional world, or four dimensions if you include time, as we often do in discussing modern physics. But we often go beyond our actual space and time to use various modelling spaces to describe aspects of our experience. For example, if a car is driving along a straight highway, we could describe the position of the car as a function of time, and even graph this position function. We could do the same for the velocity of the car by plotting a graph of its velocity function.
The actual space along which the car moves is the (essentially) one-dimensional road. By essentially, we mean that if it is understood that the car stays on the road, then it only takes one number to specify the position of the car on the road. For example, most highways have kilometre or mile markers, and you could just specify this number if you have to call an emergency vehicle to describe where on the road you are located.
The position-time graph is not, therefore, the actual space along which the car moves. Or is it? Well, it is a slice of the spacetime in which the car exists. If not literally, at least it is a representation of this spacetime. What about the velocity-time graph? This is certainly not the space in which the car actually moves; it is a bit more abstract. And there are situations where it is useful to bundle the position and velocity together, by plotting the velocity of the particle vs. its position. This is called the phase space for the car’s motion. Time is implicit in this representation; you can imagine an animation where the dot on the phase-space graph moves about as time passes. The phase space is not the actual space in which the car moves, but the motion of the dot in phase space gives us information about the car’s position and velocity in time.
If we switch our attention from the car moving on a straight road to a bird flying in space, then we would need three numbers to represent the position of the bird (its position vector), and another three numbers to represent its velocity (its velocity vector). Together, we could keep track of these six numbers in a six-dimensional phase space. It’s clear that the six-dimensional phase space is an abstract modelling space, not the real, three-dimensional space that the bird lives in.
(Technically, it is the position and momentum, not position and velocity, that are represented in phase space, but the idea is the same.)
If we had a flock of $n$ birds, then we could keep track of all of their positions and velocities in a $6n$-dimensional phase space. Again, nobody will confuse this abstract, $6n$-dimensional modelling space with the actual space that the birds fly around in. But the $6n$-dimensional phase space, thought of as an abstract modelling space, is useful in studying the flock of birds.
Therefore, although we live in a three-dimensional space, it’s helpful to study higher dimensional spaces, as these are used frequently in physical models of real processes, as the previous paragraphs suggest. We’ll discuss another such example in the context of quantum mechanics later in this post.
Vectors
Points in an $n$-dimensional space can be specified by $n$ numbers. These numbers can be conveniently listed in a row or column. Such rows or columns of numbers can be called vectors. (In computer science they are referred to as types of arrays.)
From a more abstract perspective, the points in space themselves can be called vectors, and then once a coordinate system is chosen for the space, vectors can be represented by rows or columns of numbers. In this perspective, the rows or columns of numbers, which depend on a selection of a coordinate system, are called coordinate vectors, or coordinate representations of vectors. In less abstract (and introductory) discussions, we are not careful to distinguish between a vector (i.e., a location in a space) and its representation as a row or column of numbers.
In introductions, vectors are often identified with arrows, and are said to represent quantities that have magnitude and direction, such as positions, velocities, forces, and so on. In the formal definition of a vector, an abstract approach is adopted, and a vector is defined based on its properties. Some of these properties are that you can add two vectors to obtain another vector, and that you can multiply a vector by a number to obtain another vector. (We’ll discuss the other properties in our linear algebra course.) In the more abstract approach, we don’t necessarily visualize vectors as arrows. Rather, we consider vectors to be any mathematical objects that satisfy all the properties that define vectors. In this perspective, functions can be considered to be vectors; after all, you can add two functions to obtain another function, and you can multiply a function by a number to obtain another function. Similarly, geometric transformations such as rotations can be considered to be vectors.
The advantage of the abstract approach to vectors is that once you develop the machinery of linear algebra in an abstract setting, you can then apply its results, tools, and techniques in all the specific, concrete settings that are useful in applications. We’ll see nice examples of this approach later when we come to solving linear differential equations that describe simple harmonic oscillations, a very important fundamental model in physics and engineering. The same benefits are present in the abstract approach in general in mathematics, and this is why mathematics courses are taught abstractly. But for a beginner, the abstract approach is typically confusing. Therefore, an excellent practical approach to teaching is to use a concrete approach at first, full of examples and specific calculations, and use an abstract approach only after students have grappled with a sufficient number of concrete examples and calculations that they have absorbed the basic concepts. This is the approach we adopt in our courses.
Vector Spaces
We have already argued that abstract modelling spaces are worthy of study, as they are so widely applicable. Depending on the specific application, spaces with additional structure are considered. What do we mean by additional structure? Well, consider the usual $xy$-plane that you studied in high school, thought of as a space of points. If you consider the points as vectors, then you have added additional structure to the space, for now we can do something with the points beyond just considering them. We can add points together, and we can multiply points by numbers. This may seem odd, but if you identify points as vectors, then it may not seem as odd, particularly if you have studied vectors already. In this perspective, we can consider a point on the plane as represented by a dot, or by an arrow pointing from the origin of the coordinate system to the dot. With this particular additional structure, the $xy$-plane is a two-dimensional vector space.
You can visualize a three-dimensional vector space by considering the room that you are in. You can consider each location in the room as a point in the space. Similarly, you can choose a point in the room as the origin of a coordinate system, and visualize each location in space as a vector from the origin to the location. Can you see the room bristling with vectors?
If you have learned about vectors already, then you may be familiar with the dot product operation, also called scalar product, or inner product. This product allows one to “multiply” two vectors together to obtain a number. This kind of multiplication has interesting geometric and physical interpretations. An inner product is an example of additional structure that can be included with a vector space; the result is called an inner product space. Additional structures can be included, as you will see if you study the subject more deeply. Each additional structure is useful in a certain kind of application.
Vector spaces with dimensions up to three can be visualized, but we can also consider vector spaces of higher dimension. We work with higher-dimensional vector spaces symbolically, using algebra, and this allows us to apply them quantitatively.
Basis of a Vector Space
In the $xy$-plane, there are two special vectors that form what is called the standard basis for the plane. Each of these vectors has length $1$ unit, and so each is called a unit vector. The first vector, often symbolized by i, starts at the origin and points in the positive $x$-direction. The second vector, often symbolized by j, starts at the origin and points in the positive $y$-direction. It is a fact that each point in the $xy$-plane can be expressed as a linear combination of these two special vectors. (A linear combination of two vectors is a certain multiple of the first vector added to a certain multiple of the second vector.) That is, each location in the $xy$-plane can be reached by first moving a certain distance parallel to the $x$-axis, and then moving a certain distance parallel to the $y$-axis. For instance,
$$(4.3, -1.7) = 4.3{\bf i} + (-1.7){\bf j}$$
which can also be expressed as
$$(4.3, -1.7) = 4.3{\bf i} – 1.7{\bf j}$$
The fact discussed in the previous paragraph can be summarized by saying that the vectors i and j form a basis for the $xy$-plane. It is a fact that any two non-zero vectors that are not in the same direction also form a basis for the $xy$-plane. You might give this some thought to convince yourself that this is so; drawing some diagrams is bound to be helpful. For example, if your basis vectors are $(1, 1)$ and $(1, -1)$, then it is possible to write the vector $(4.3, -1.7)$ as a linear combination of these two basis vectors:
$$(4.3, -1.7) = 1.3(1, 1) + 3(1, -1)$$
This is just one example, however. To convince yourself that the vectors $(1, 1)$ and $(1, -1)$ really do form a basis for the $xy$-plane, you would have to prove that no matter which vector in the plane you choose, it is always possible to express it as a linear combination of the two basis vectors. That is, given an arbitrary vector $(a, b)$ in the $xy$-plane, you would have to show that it is always possible to determine numbers $x$ and $y$ such that the equation
$$(a, b) = x(1, 1) + y(1, -1)$$
has a solution. This can be written equivalently as
$$(a, b) = (x, x) + (y, -y)$$
which is also equivalent to
$$(a, b) = (x + y, x – y)$$
Our task is equivalent to proving that no matter what the values of $a$ and $b$ are, the system of linear equations
$$x + y = a$$
$$x – y = b$$
has a solution. This brings us full circle to the problem we started out with at the beginning of this post. Can you solve this system of equations? If so, check that your solution is also valid for the special case considered a few paragraphs ago.
Atoms and Molecules
The idea of a basis in the context of vector spaces is reminiscent of the concept of atoms in chemistry. Consider all of the enormous number of possible chemical molecules; all of these can be constructed using a small number (about 100 or so) different species of atoms. To cope with the enormous complexity of molecules, we first learn about the properties of the relatively small number of atomic building blocks, and this helps us to tackle the properties of molecules. Similarly, in elementary particle physics, there is a bewilderingly large number of possible particles, but there is a smaller number of building blocks out of which they can be constructed. Protons, neutrons, and all the other many types of hadrons can all be constructed out of a few types of quarks.
The previous paragraph illustrates one aspect of reductionism, which has been a very fruitful tool in science in general. A similar thinking tool is very useful in mathematics, where one continually tries to cope with very complex situations by expressing them in terms of combinations of simpler ones.
For example, consider two blocks that are free to slide along a straight line on a frictionless surface. The two blocks are each connected to side walls with a spring, and the two blocks are also connected to each other with a spring. If you displace the blocks and then release them, they will begin to move. What are the possible motions? Well, this is a very complex situation, and there are an infinite number of possible motions, depending on how you initially displace the blocks. You can model the motion of each block by a linear differential equation that you can write with the help of Newton’s second law of motion. These two differential equations are said to be coupled because the positions of both blocks appear in both equations. Understanding the possible motions of the blocks requires the solution of this system of two linear differential equations. This is partly a calculus problem (the equations are differential equations, not algebraic equations), but it is also partly a linear algebra problem. When we get to the point that we can understand how to solve this problem, we will have gone a considerable way along the journey of learning about calculus, learning about linear algebra, learning how the two work together, and learning how they illuminate physical and other kinds of applied problems. But here is a spoiler to the story: It turns out that once you determine two “basic” solutions to the system of differential equations, they form a basis (in the linear algebra sense) for the set of all the infinite number of solutions to the system. As we shall see …
One might guess that the situation described in the previous paragraph is very artificial and not of much practical use, but that is a limited point of view. In fact, once you get the hang of how to solve this somewhat artificial problem, you will be able to take the next step and tackle more practical problems based on the same ideas. For example, what is a crystalline solid object? We can think of such an object as a three-dimensional array of atoms connected by inter-atomic forces. We can model such an object as a three-dimensional array of blocks, all interconnected by springs. And this is typically how we learn to solve complex problems in mathematics, physics, engineering, and computer science. We begin with relatively simple problems that nevertheless introduce the important concepts that will prove useful. Then we gradually move to more and more sophisticated problems, requiring more complex mathematical models, that approximate reality more and more closely. Eventually, if you continue with this learning process for a sufficient time, you will reach the forefront of research.
Quantum Mechanics
A crystalline solid object can be modelled by an array of coupled harmonic oscillators, as we discussed in the previous paragraphs. The best physics theory we have available for studying objects at an atomic or subatomic level is quantum mechanics. In quantal descriptions of matter, we model an atomic or subatomic system by a certain kind of vector (called the system’s state vector) in a certain kind of vector space. The state vector of the system is so called because one can calculate everything about the system that it is possible to calculate (as far as we currently know) by manipulating the state vector in various ways. The state vector for a system satisfies a partial differential equation called the Schrodinger equation. Another name for the state vector is the wave function. From the calculus perspective, it makes sense to call it a wave function, because the solution of a differential equation is a function. But from the linear algebra perspective, it makes sense to call it a state vector, because it is a vector in a certain kind of vector space.
If you want to understand modern physics deeply, then you will eventually have to learn a certain amount of quantum mechanics. Materials science is a very active area of research at the moment, and so physicists, chemists, engineers, and biologists will all be interested in learning something about quantum mechanics. Even if you are computer science student, or a mathematics student, your studies may be enriched by learning something about quantum mechanics. In computing science, quantum computing is a growing field. In mathematics, quantum mechanics has stimulated research in a number of sub-fields of mathematics, such as functional analysis.
Linear algebra is a critical prerequisite for learning about quantum mechanics. But the applications of linear algebra are vast, and even if you decide not to learn more about quantum mechanics, there are numerous applications of linear algebra in science, engineering, computer science and computational science, economics, and so on. Learning about linear algebra and calculus will give you a firm mathematical foundation for numerous university programs.