📦 3b1b / captions

📄 transcript.txt · 180 lines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180Let me pull out an old differential equations textbook that I learned from in college, and let's turn to this funny little exercise in here that asks the reader to compute E to the power a t, where a we're told is going to be a matrix, and the insinuation seems to be that the results will also be a matrix. 
It then offers several examples for what you might plug in for a. 
Now, taking out a context, putting a matrix into an exponent like this probably seems like total nonsense, but what it refers to is an extremely beautiful operation, and the reason it shows up in this book is that it's useful. 
It's used to solve a very important class of differential equations. 
In turn, given that the universe is often written in the language of differential equations, you see this pop up in physics all the time too, especially in quantum mechanics, where matrix exponents are littered throughout the place. 
They play a particularly prominent role. 
This has a lot to do with Schrodinger's equation, which we'll touch on a bit later, and it may also help in understanding your romantic relationships, but again, all in due time. 
A big part of the reason I want to cover this topic is that there is an extremely nice way to visualize what matrix exponents are actually doing using flow that not a lot of people seem to talk about, but for the bulk of this chapter, let's start by laying out what exactly the operation is, and see if we can get a feel for what kinds of problems it helps us to solve. 
The first thing you should know is that this is not some bizarre way to multiply the constant e by itself multiple times. 
You would be right to call that nonsense. 
The actual definition is related to a certain infinite polynomial for describing real number powers of e, what we call its Taylor series. 
For example, if I took the number 2 and plugged it into this polynomial, then as you add more and more terms, each of which looks like some power of 2 divided by some factorial. 
The sum approaches a number near 7.389, and this number is precisely e times e. 
If you increment this input by one, then somewhat miraculously, no matter where you started from, the effect on the output is always to multiply it by another factor of e. 
For reasons that you're going to see in a bit, mathematicians became interested in plugging all kinds of things into this polynomial, things like complex numbers, and for our purposes today, matrices, even when those objects do not immediately make sense as exponents. 
What some authors do is give this infinite polynomial the name exp when you plug in more exotic inputs. 
It's a gentle nod to the connection that this has to exponential functions in the case of real numbers, even though obviously these inputs don't make sense as exponents. 
However, an equally common convention is to give a much less gentle nod to the connection and just abbreviate the whole thing as e to the power of whatever object you're plugging in, whether that's a complex number or a matrix or all sorts of more exotic objects. 
So while this equation is a theorem for real numbers, it's a definition for more exotic inputs. 
Cynically, you could call this a blatant abuse of notation. 
More charitably, you might view it as an example of the beautiful cycle between discovery and invention in math. 
In either case, plugging in a matrix even to a polynomial might seem a little strange, so let's be clear on what we mean here. 
The matrix has to have the same number of rows and columns. 
That way you can multiply it by itself according to the usual rules of matrix multiplication. 
This is what we mean by squaring it. 
Similarly, if you were to take that result and then multiply it by the original matrix again, this is what we mean by cubing the matrix. 
If you carry on like this, you can take any whole number power of a matrix, it's perfectly sensible. 
In this context, powers still mean exactly what you would expect, repeated multiplication. 
Each term in this polynomial is scaled by one divided by some factorial, and with matrices, all that means is that you multiply each component by that number. 
Likewise, it always makes sense to add together two matrices, this is something you again do term by term. 
The astute among you might ask how sensible it is to take this out to infinity, which would be a great question, one that I'm largely going to postpone the answer to, but I can show you one pretty fun example here now. 
Take this 2x2 matrix that has negative pi and pi sitting off its diagonal entries. 
Let's see what the sum gives. 
The first term is the identity matrix, this is actually what we mean by definition when we raise a matrix to the zeroth power.
Then we add the matrix itself, which gives us the pi off the diagonal terms, and then add half of the matrix squared, and continuing on I'll have the computer keep adding more and more terms, each of which requires taking one more matrix product to get the new power, and then adding it to a running tally. 
And as it keeps going, it seems to be approaching a stable value, which is around negative one times the identity matrix. 
In this sense, we say the infinite sum equals that negative identity. 
By the end of this video, my hope is that this particular fact comes to make total sense to you. 
For any of you familiar with Euler's famous identity, this is essentially the matrix version of that. 
It turns out that in general, no matter what matrix you start with, as you add more and more terms, you eventually approach some stable value, though sometimes it can take quite a while before you get there. 
Just seeing the definition like this in isolation raises all kinds of questions, most notably, why would mathematicians and physicists be interested in torturing their poor matrices this way? 
What problems are they trying to solve? 
And if you're anything like me, a new operation is only satisfying when you have a clear view of what it's trying to do, some sense of how to predict the output based on the input before you actually crunch the numbers. 
How on earth could you have predicted that the matrix with pi off the diagonals results in a negative identity matrix like this? 
Often in math you should view the definition not as a starting point, but as a target. 
Contrary to the structure of textbooks, mathematicians do not start by making definitions and then listing a lot of theorems and proving them and then showing some examples. 
The process of discovering math typically goes the other way around. 
They start by chewing on specific problems, and then generalizing those problems, then coming up with constructs that might be helpful in those general cases, and only then do you write down a new definition, or extend an old one. 
As to what sorts of specific examples might motivate matrix exponents, two come to mind. 
One involving relationships, and the other quantum mechanics. 
Let's start with relationships. 
Suppose we have two lovers, let's call them Romeo and Juliet, and let's let x represent Juliet's love for Romeo, and y represent his love for her, both of which are going to be values that change with time.
This is an example we actually touched on in chapter 1, based on a Steven Strogatz article, but it's okay if you didn't see that. 
The way their relationship works is that the rate at Juliet's love for Romeo changes, the derivative of this value, is equal to negative one times Romeo's love for her. 
So in other words, when Romeo is expressing cool disinterest, that's when Juliet's feelings actually increase, whereas if he becomes too infatuated, her interest will start to fade. 
Romeo, on the other hand, is the opposite. 
The rate of change of his love is equal to the of Juliet's love, so while Juliet is mad at him, his affections tend to decrease, whereas if she loves him, that's when his feelings grow. 
Of course, neither one of these numbers is holding still. 
As Romeo's love increases in response to Juliet, her equation continues to apply and drives her love down. 
Both of these equations always apply, from each infinitesimal point in time to the next, so every slight change to one value immediately influences the rate of change of the other. 
This is a system of differential equations. 
It's a puzzle, where your challenge is to find explicit functions for x of t and y of t that make both of these expressions true. 
Now, as systems of differential equations go, this one is on the simpler side, enough so that many calculus students could probably just guess at an answer. 
But keep in mind, it's not enough to find some pair of functions that makes this true. 
If you want to actually predict where Romeo and Juliet end up after some starting point, you have to make sure that your functions match the initial set of conditions at time t equals 0. 
More to the point, our actual goal today is to systematically solve more general versions of this equation, without guessing and checking, and it's that question that leads us to matrix exponents. 
Very often when you have multiple changing values like this, it's helpful to package them together as coordinates of a single point in a higher dimensional space. 
So for Romeo and Juliet, think of their relationship as a point in a 2D space, the x-coordinate capturing Juliet's feelings, and the y-coordinate capturing Romeo's. 
Sometimes it's helpful to picture this as an arrow from the origin, other times just as a point. 
All that really matters is that it encodes two numbers, and moving forward we'll be writing that as a column vector. 
And of course, this is all a function of time. 
You might picture the rate of change of this state, the thing that packages together the derivative of x and the derivative of y, as a kind of velocity vector in this state space, something that tugs at our point in some direction and with some magnitude that indicates how quickly it's changing. 
Remember, the rule here is that the rate of change of x is negative y, and the rate of change of y is x. 
Set up as vectors like this, we could rewrite the right hand side of this equation as a product of this matrix with the original vector xy. 
The top row encodes Juliet's rule, and the bottom row encodes Romeo's rule. 
So what we have here is a differential equation telling us that the rate of change of some vector is equal to a certain matrix times itself.
In a moment we'll talk about how matrix exponentiation solves this kind of equation, but before that let me show you a simpler way that we can solve this particular system, one that uses pure geometry, and it helps set the stage for visualizing matrix exponents a bit later. 
This matrix from our system is a 90 degree rotation matrix. 
For any of you rusty on how to think about matrices as transformations, there's a video all about it on this channel, a series really. 
The basic idea is that when you multiply a matrix by the vector 1 0, it pulls out the first column, and similarly if you multiply it by 0 1, that pulls out the second column. 
What this means is that when you look at a matrix, you can read its columns as telling you what it does to these two vectors, known as the matrix. 
The way it acts on any other vector is a result of scaling and adding these two basis results by that vector's coordinates. 
So looking back at the matrix from our system, notice how from its columns we can tell it takes the first basis vector to 0 1, and the second to negative 1 0, hence why I'm calling it the 90 degree rotation matrix. 
What it means for our equation is that it's saying wherever Romeo and Juliet are in this space, their rate of change has to look like a 90 degree rotation of this position vector. 
The only way velocity can permanently be perpendicular to position like this is when you rotate around the origin in circular motion, never growing or shrinking because the rate of change has no component in the direction of the position. 
More specifically, since the length of this velocity vector equals the length of the position vector, then for each unit of time, the distance that this covers is equal to one radius's worth of arc length along that circle. 
In other words, it rotates at one radian per unit time, so in particular it would take 2 pi units of time to make a full revolution. 
If you want to describe this kind of rotation with a formula, we can use a more general rotation matrix, which looks like this. 
Again, we can read it in terms of the columns. 
Notice how the first column tells us that it takes that first basis vector to cos t sin t, and the second column tells us that it takes the second basis vector to negative sin t cos t, both of which are consistent with rotating by t radians. 
So, to solve the system, if you want to predict where Romeo and Juliet end up after t units of time, you can multiply this matrix by their initial state. 
The active viewers among you might also enjoy taking a moment to pause and confirm that the explicit formulas you get out of this for x of t and y of t really do satisfy the system of differential equations that we started with. 
The mathematician in you might wonder if it's possible to solve not just this specific system, but equations like it for any other matrix, no matter its coefficients. 
To ask this question is to set yourself up to rediscover matrix exponents. 
The main goal for today is for you to understand how this equation lets you intuitively picture the operation which we write as e raised to a matrix, and on the flip side, how being able to compute matrix exponents lets you explicitly solve this equation. 
A much less whimsical example is Schrodinger's famous equation, which is the fundamental equation describing how systems in quantum mechanics change over time. 
It looks pretty intimidating, and I mean it's quantum mechanics so of course it will, but it's actually not that different from the Romeo-Juliet setup. 
This symbol here refers to a certain vector. 
It's a vector that packages together all the information you might care about in a system, like the various particles' positions and momenta. 
It's analogous to our simpler 2D vector that encoded all the information about Romeo and Juliet. 
The equation says that the rate at which this state vector changes looks like a certain matrix times itself. 
There are a number of things that make Schrodinger's equation notably more complicated, but in the back of your mind you might think of it as a target point that you and I can build up to, with simpler examples like Romeo and Juliet offering more friendly stepping stones along the way. 
Actually the simplest example, which is tied to ordinary real number powers of e, is the one-dimensional case. 
This is when you have a single changing value, and its rate of change equals some constant times itself. 
So the bigger the value, the faster it grows. 
Most people are more comfortable visualizing this with a graph, where the higher the value of the graph, the steeper its slope, resulting in this ever-steepening upward curve. 
Just keep in mind that when we get to higher dimensional variance, graphs are a lot less helpful. 
This is a highly important equation in its own right. 
It's a very powerful concept when the rate of change of a value is proportional to the value itself. 
This is the equation governing things like compound interest, or the early stages of population growth before the effects of limited resources kick in, or the early stages of an epidemic while most of the population is susceptible.
Calculus students all learn about how the derivative of e to the rt is r times itself. 
In other words, this self-reinforcing growth phenomenon is the same thing as exponential growth, and e to the rt solves this equation. 
Actually, a better way to think about it is that there are many different solutions to this equation, one for each initial condition, something like an initial investment size or an initial population, which we'll just call x0. 
Notice, by the way, how the higher the value for x0, the higher the initial slope of the resulting solution, which should make complete sense given the equation. 
The function e to the rt is just a solution when the initial condition is 1, but if you multiply by any other initial condition, you get a new function which still satisfies this property. 
It still has a derivative which is r times itself, but this time it starts at x0 since e to the 0 is 1. 
This is worth highlighting before we generalize to more dimensions. 
Do not think of the exponential part as being a solution in and of itself. 
Think of it as something that acts on an initial condition in order to give a solution. 
You see, up in the two-dimensional case, when we have a changing vector whose rate of change is constrained to be some matrix times itself, what the solution looks like is also an exponential term acting on a given initial condition, but the exponential part in that case will produce a matrix that changes with time, and the initial condition is a vector. 
In fact, you should think of the definition of matrix exponentiation as being heavily motivated by making sure that this fact is true. 
For example, if we look back at the system that popped up with Romeo and Juliet, the claim now is that solutions look like e raised to this 0, negative 1, 1, 0 matrix all times time multiplied by some initial condition. 
But we've already seen the solution in this case, we know it looks like a rotation matrix times the initial condition. 
So let's take a moment to roll up our sleeves and compute the exponential term using the definition that I mentioned at the start, and see if it lines up. 
Remember, writing e to the power of a matrix is a shorthand, a shorthand for plugging it in to this long infinite polynomial, the Taylor series for e to the x. 
I know it might seem pretty complicated to do this, but trust me, it's very satisfying how this particular one turns out. 
If you actually sit down and you compute successive powers of this matrix, what you'd notice is that they fall into a cycling pattern every four iterations. 
This should make sense given that we know it's a 90 degree rotation matrix. 
So when you add together all infinitely many matrices term by term, each term in the result looks like a polynomial in t with some nice cycling pattern in its coefficients, all of them scaled by the relevant factorial term. 
Those of you who are savvy with Taylor series might be able to recognize that each one of these components is the Taylor series for either sine or cosine, though in that top right corner's case it's actually negative sine. 
So what we get from the computation is exactly the rotation matrix we had from before. 
To me, this is extremely beautiful. 
We have two completely different ways of reasoning about the same system, and they give us the same answer. 
It's reassuring that they do, but it's wild just how different the mode of thought is when you're chugging through this polynomial versus when you're geometrically reasoning about what a velocity perpendicular to a position must imply. 
Hopefully the fact that these line up inspires a little confidence in the claim that matrix exponents really do solve systems like this. 
This explains the computation we saw at the start, by the way, with the matrix that had negative pi and pi off the diagonals, producing the negative identity. 
This expression is exponentiating a 90 degree rotation matrix times pi, which is another way to describe what the Romeo-Juliet setup does after pi units of time. 
As we now know, that has the effect of rotating everything 180 degrees in this state space, which is the same as multiplying by negative 1. 
Also, for any of you familiar with imaginary number exponents, this particular example is ringing a ton of bells. 
It is 100% analogous. 
In fact, we could have framed the entire example where Romeo and Juliet's feelings were packaged into a complex number, and the rate of change of that complex number would have been i times itself, since multiplication by i also acts like a 90 degree rotation. 
The same exact line of reasoning, both analytic and geometric, would have led to this whole idea that e to the power i t describes rotation. 
These are actually two of many different examples throughout math and physics when you find yourself exponentiating some object which acts as a 90 degree rotation times time. 
It shows up with quaternions or many of the matrices that pop up in quantum mechanics. 
In all of these cases, we have this really neat general idea that if you take some operation that rotates 90 degrees in some plane, often it's a plane in some high dimensional space that we can't visualize, then what we get by exponentiating that operation times time is something that generates all other rotations in that same plane. 
One of the more complicated variations on this same theme is Schrodinger's equation. 
It's not just that this has the derivative of a state equals some matrix times that state form. 
The nature of the relevant matrix here is such that the equation also describes a kind of rotation, though in many applications of Schrodinger's equation it'll be a rotation in a kind of function space. 
It's a little more involved though because typically there's a combination of many different rotations. 
It takes time to really dig into this equation and I would love to do that in a later chapter, but right now I cannot help but at least allude to the fact that this imaginary unit i that sits so impishly in such a fundamental equation for all of the universe is playing basically the same role as the matrix from our Romeo-Julia example. 
What this i communicates is that the rate of change of a certain state is, in a sense, perpendicular to that state, and hence that the way things have to evolve over time will involve a kind of oscillation. 
But matrix exponentiation can do so much more than just rotation. 
You can always visualize these sorts of differential equations using a vector field. 
The idea is that this equation tells us the velocity of a state is entirely determined by its position, so what we do is go to every point in the space and draw a little vector indicating what the velocity of a state must be if it passes through that point. 
For our type of equation, this means that we go to each point v in space and we attach the vector m times v. 
To intuitively understand how any given initial condition will evolve, you let it flow along this field with a velocity always matching whatever vector it's sitting on at any given point in time. 
So if the claim is that solutions to this equation look like e to the m t times some initial condition, it means you can visualize what the matrix e to the m t does by letting every possible initial condition flow along this field for t units of time. 
The transition from start to finish is described by whatever matrix pops out from the computation for e to the m t. 
In our main example with the 90 degree rotation matrix, the vector field looks like this, and as we saw e to the m t describes rotation in that case, which lines up with flow along this field. 
As another example, the more Shakespearean Romeo and Juliet might have equations that look a little more like this, where Juliet's rule is symmetric with Romeo's, and both of them are inclined to get carried away in response to one another's feelings. 
Again, the way the vector field you're looking at has been defined is to go to each point v in space and attach the vector m times v. 
This is the pictorial way of saying that the rate of change of a state must always equal m times itself. 
But for this example, flow along the field looks a lot different from how it did before. 
If Romeo and Juliet start off anywhere in this upper right half of the plane, their feelings will feed off of each other and they both tend towards infinity. 
If they're in the other half of the plane, well let's just say that they stay more true to their in Montague family traditions. 
So even before you try calculating the exponential of this particular matrix, you can already have an intuitive sense for what the answer should look like. 
The resulting matrix should describe the transition from time 0 to time t, which if you look at the field seems to indicate that it will squish along one diagonal while stretching along another, getting more extreme as t gets larger. 
Of course, all of this is presuming that e to the m t times an initial condition actually solves these systems. 
This is one of those facts that's easiest to believe when you just work it out yourself. 
But I'll run through a quick rough sketch. 
Write out the full polynomial that defines e to the m t and multiply by some initial condition vector on the right. 
And then take the derivative of this with respect to t. 
Because the matrix m is constant, this just means applying the power rule to each one of the terms. 
And that power rule really nicely cancels out with the factorial terms. 
So what we're left with is an expression that looks almost identical to what we had before, except that each term has an extra m hanging on to it, but this can be factored out to the left. 
So the derivative of the expression is m times the original expression, and hence it solves the equation. 
This actually sweeps under the rug some details required for rigor, mostly centered around the question of whether or not this thing actually converges, but it does give the main idea. 
In the next chapter I would like to talk more about the properties that this operation has, most notably its relationship with eigenvectors and eigenvalues, which leads us to more concrete ways of thinking about how you actually carry out this computation, which otherwise seems insane. 
Also, time permitting, it might be fun to talk about what it means to raise e to the power of the derivative operator. 
Thank you.