2021-08-29 00:25:17 +00:00
|
|
|
|
```@meta
|
|
|
|
|
CurrentModule = BeefBLUP
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
# How to Calculate EPDs
|
|
|
|
|
|
|
|
|
|
Not to exclude our Australian comrades or our dairy friends, this guide could
|
|
|
|
|
alternately be called
|
|
|
|
|
|
|
|
|
|
- How to Calculate Expected Breeding Values (EBVs)
|
|
|
|
|
- How to Calculate Predicted Transmitting Abilities (PTAs)
|
|
|
|
|
- How to Calculate Expected Progeny Differences (EPDs)
|
|
|
|
|
|
|
|
|
|
Since I'm mostly talking to American beef producers, though, we'll stick with
|
|
|
|
|
EPDs for most of this discussion.
|
|
|
|
|
|
|
|
|
|
Expected Breeding Values (EBVs) (which are more often halved and published as
|
2021-08-29 13:08:23 +00:00
|
|
|
|
Expected Progeny Differences (EPDs) or Predicted Transmitting Abilities (PTAs)
|
2021-08-29 00:25:17 +00:00
|
|
|
|
in the United States) are generally found using Charles Henderson's linear
|
|
|
|
|
mixed-model equations. Great, you say, what is that? I'm glad you asked...
|
|
|
|
|
|
|
|
|
|
## The mathematical model
|
|
|
|
|
|
|
|
|
|
Every genetics textbook starts with the following equation
|
|
|
|
|
|
|
|
|
|
```math
|
|
|
|
|
P = G + E
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Where:
|
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
- ``P`` = phenotype
|
|
|
|
|
- ``G`` = genotype (think: breeding value)
|
|
|
|
|
- ``E`` = environmental factors
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
Now, we can't identify _every_ environmental factor that affects phenotype, but
|
2021-08-29 13:08:23 +00:00
|
|
|
|
we can identify some of them, so let's substitute ``E`` with some absolutes. A
|
2021-08-29 00:25:17 +00:00
|
|
|
|
good place to start is the "contemporary group" listings for the trait of
|
2021-08-29 13:08:23 +00:00
|
|
|
|
interest in the
|
|
|
|
|
[BIF Guidelines](https://beefimprovement.org/wp-content/uploads/2018/03/BIFGuidelinesFinal_updated0318.pdf),
|
|
|
|
|
though for the purposes of this example, I'm only going to consider sex, and
|
|
|
|
|
birth year.
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
```math
|
|
|
|
|
P = G + E_{year} + E_{sex}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Where:
|
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
- ``E_n`` is the effect of ``n`` on the phenotype
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
Now let's say I want to find the weaning weight breeding value (``G``) of my
|
2021-08-29 00:25:17 +00:00
|
|
|
|
favorite herd bull. I compile his stats, and then plug them into the equation
|
2021-08-29 13:08:23 +00:00
|
|
|
|
and solve for ``G``, right? Let's try that.
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
### Calf Records
|
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
| ID | Birth Year | Sex | YW (kg) |
|
|
|
|
|
|:-- | :--------- | :----- |:------- |
|
|
|
|
|
| 1 | 1990 | Male | 354 |
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
```math
|
2021-08-29 13:08:23 +00:00
|
|
|
|
354 \ \textup{kg} = G_1 + E_{1990} + E_{male}
|
2021-08-29 00:25:17 +00:00
|
|
|
|
```
|
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
Hmm. I just realized I don't know any of those ``E`` values. Come to think of it,
|
2021-08-29 00:25:17 +00:00
|
|
|
|
I remember from math class that I will need as many equations as I have
|
|
|
|
|
unknowns, so I will add equations for other animals that I have records for.
|
|
|
|
|
|
|
|
|
|
### Calf Records
|
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
| ID | Birth Year | Sex | YW (kg) |
|
|
|
|
|
|:--- |:---------- |:------ |:------- |
|
|
|
|
|
| 1 | 1990 | Male | 354 |
|
|
|
|
|
| 2 | 1990 | Female | 251 |
|
|
|
|
|
| 3 | 1991 | Male | 327 |
|
|
|
|
|
| 4 | 1991 | Female | 328 |
|
|
|
|
|
| 5 | 1991 | Male | 301 |
|
|
|
|
|
| 6 | 1991 | Female | 270 |
|
|
|
|
|
| 7 | 1992 | Male | 330 |
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
```math
|
|
|
|
|
\begin{aligned}
|
2021-08-29 13:08:23 +00:00
|
|
|
|
251 \ \textup{kg} &= G_2 + E_{1990} + E_{female} \\
|
|
|
|
|
327 \ \textup{kg} &= G_3 + E_{1991} + E_{male} \\
|
|
|
|
|
328 \ \textup{kg} &= G_4 + E_{1991} + E_{female} \\
|
|
|
|
|
301 \ \textup{kg} &= G_5 + E_{1991} + E_{male} \\
|
|
|
|
|
270 \ \textup{kg} &= G_6 + E_{1991} + E_{female} \\
|
|
|
|
|
330 \ \textup{kg} &= G_7 + E_{1992} + E_{male}
|
2021-08-29 00:25:17 +00:00
|
|
|
|
\end{aligned}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Drat! Every animal I added brings more variables into the system than it
|
|
|
|
|
eliminates! In fact, since each cow brings in _at least_ one term
|
2021-08-29 13:08:23 +00:00
|
|
|
|
(``G_n``), I will never be able to write enough equations to solve for
|
|
|
|
|
``G`` numerically. I will have to use a different approach.
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
## The statistical model: the setup
|
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
Since I can never solve for ``G`` directly, I will have to find some way to
|
|
|
|
|
estimate it. I can switch to a statistical model and solve for ``G`` that way. The
|
2021-08-29 00:25:17 +00:00
|
|
|
|
caveat with a statistical model is that there will be some level of error, but
|
|
|
|
|
so long as we know and can control the level of error, that will be better than
|
2021-08-29 13:08:23 +00:00
|
|
|
|
not knowing ``G`` at all.
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
Since we're switching into a statistical space, we should also switch the
|
|
|
|
|
variables we're using. I'll rewrite the first equation as
|
|
|
|
|
|
|
|
|
|
```math
|
|
|
|
|
y = b + u + e
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Where:
|
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
- ``y`` = Phenotype
|
|
|
|
|
- ``b`` = Environment
|
|
|
|
|
- ``u`` = Genotype
|
|
|
|
|
- ``e`` = Error
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
It's not as easy as simply substituting ``b`` for every ``E`` that we had above,
|
2021-08-29 00:25:17 +00:00
|
|
|
|
however. The reason for that is that we must make the assumption that
|
|
|
|
|
environment is a **fixed effect** and that genotype is a **random effect**. I'll
|
|
|
|
|
go over why that is later, but for now, understand that we need to transform the
|
|
|
|
|
environment terms and genotype terms separately.
|
|
|
|
|
|
|
|
|
|
We'll start with the environment terms.
|
|
|
|
|
|
|
|
|
|
## The statistical model: environment as fixed effects
|
|
|
|
|
|
|
|
|
|
To properly transform the equations, I will have to introduce
|
2021-08-29 13:08:23 +00:00
|
|
|
|
``b_{mean}`` terms in each animal's equation. This is part of the fixed
|
2021-08-29 00:25:17 +00:00
|
|
|
|
effect statistical assumption, and it will let us obtain a solution.
|
|
|
|
|
|
|
|
|
|
Here are the transformed equations:
|
|
|
|
|
|
|
|
|
|
```math
|
|
|
|
|
\begin{aligned}
|
2021-08-29 13:08:23 +00:00
|
|
|
|
354 \ \textup{kg} &= u_1 + b_{mean} + b_{1990} + b_{male} + e_1 \\
|
|
|
|
|
251 \ \textup{kg} &= u_2 + b_{mean} + b_{1990} + b_{female} + e_2 \\
|
|
|
|
|
327 \ \textup{kg} &= u_3 + b_{mean} + b_{1991} + b_{male} + e_3 \\
|
|
|
|
|
328 \ \textup{kg} &= u_4 + b_{mean} + b_{1991} + b_{female} +e_4 \\
|
|
|
|
|
301 \ \textup{kg} &= u_5 + b_{mean} + b_{1991} + b_{male} + e_5 \\
|
|
|
|
|
270 \ \textup{kg} &= u_6 + b_{mean} + b_{1991} + b_{female} + e_6 \\
|
|
|
|
|
330 \ \textup{kg} &= u_7 + b_{mean} + b_{1992} + b_{male} + e_7
|
2021-08-29 00:25:17 +00:00
|
|
|
|
\end{aligned}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Statistical methods work best in matrix form, so I'm going to convert the set of
|
|
|
|
|
equations above to a single matrix equation that means the exact same thing.
|
|
|
|
|
|
|
|
|
|
```math
|
|
|
|
|
\begin{bmatrix}
|
2021-08-29 13:08:23 +00:00
|
|
|
|
354 \ \textup{kg} \\
|
|
|
|
|
251 \ \textup{kg} \\
|
|
|
|
|
327 \ \textup{kg} \\
|
|
|
|
|
328 \ \textup{kg} \\
|
|
|
|
|
301 \ \textup{kg} \\
|
|
|
|
|
270 \ \textup{kg} \\
|
|
|
|
|
330 \ \textup{kg}
|
2021-08-29 00:25:17 +00:00
|
|
|
|
\end{bmatrix}
|
|
|
|
|
=
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
u_1 \\
|
|
|
|
|
u_2 \\
|
|
|
|
|
u_3 \\
|
|
|
|
|
u_4 \\
|
|
|
|
|
u_5 \\
|
|
|
|
|
u_6 \\
|
|
|
|
|
u_7
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
+
|
|
|
|
|
b_{mean}
|
|
|
|
|
+
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
b_{1990} \\
|
|
|
|
|
b_{1990} \\
|
|
|
|
|
b_{1991} \\
|
|
|
|
|
b_{1991} \\
|
|
|
|
|
b_{1991} \\
|
|
|
|
|
b_{1991} \\
|
|
|
|
|
b_{1992}
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
+
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
b_{male} \\
|
|
|
|
|
b_{female} \\
|
|
|
|
|
b_{male} \\
|
|
|
|
|
b_{female} \\
|
|
|
|
|
b_{male} \\
|
|
|
|
|
b_{female} \\
|
|
|
|
|
b_{male}
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
+
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
e_1 \\
|
|
|
|
|
e_2 \\
|
|
|
|
|
e_3 \\
|
|
|
|
|
e_4 \\
|
|
|
|
|
e_5 \\
|
|
|
|
|
e_6 \\
|
|
|
|
|
e_7
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
```
|
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
That's a nice equation, but now my hand is getting tired writing all those ``b``
|
|
|
|
|
terms over and over again, so I'm going to use
|
|
|
|
|
[the dot product](https://www.khanacademy.org/math/precalculus/x9e81a4f98389efdf:matrices/x9e81a4f98389efdf:multiplying-matrices-by-matrices/v/matrix-multiplication-intro)
|
|
|
|
|
to condense this down.
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
```math
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
354 \textup{kg} \\
|
|
|
|
|
251 \textup{kg} \\
|
|
|
|
|
327 \textup{kg} \\
|
|
|
|
|
328 \textup{kg} \\
|
|
|
|
|
301 \textup{kg} \\
|
|
|
|
|
270 \textup{kg} \\
|
|
|
|
|
330 \textup{kg}
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
=
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
u_1 \\
|
|
|
|
|
u_2 \\
|
|
|
|
|
u_3 \\
|
|
|
|
|
u_4 \\
|
|
|
|
|
u_5 \\
|
|
|
|
|
u_6 \\
|
|
|
|
|
u_7
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
+
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
1 & 1 & 0 & 0 & 1 & 0 \\
|
|
|
|
|
1 & 1 & 0 & 0 & 0 & 1 \\
|
|
|
|
|
1 & 0 & 1 & 0 & 1 & 0 \\
|
|
|
|
|
1 & 0 & 1 & 0 & 0 & 1 \\
|
|
|
|
|
1 & 0 & 1 & 0 & 1 & 0 \\
|
|
|
|
|
1 & 0 & 0 & 1 & 1 & 0
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
b_{mean} \\
|
|
|
|
|
b_{1990} \\
|
|
|
|
|
b_{1991} \\
|
|
|
|
|
b_{1992} \\
|
|
|
|
|
b_{male} \\
|
|
|
|
|
b_{female}
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
+
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
e_1 \\
|
|
|
|
|
e_2 \\
|
|
|
|
|
e_3 \\
|
|
|
|
|
e_4 \\
|
|
|
|
|
e_5 \\
|
|
|
|
|
e_6 \\
|
|
|
|
|
e_7
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
That matrix in the middle with all the zeros and ones is called the **incidence
|
|
|
|
|
matrix**, and essentially reads like a table with each row corresponding to an
|
|
|
|
|
animal, and each column corresponding to a fixed effect. For brevity, we'll just
|
2021-08-29 13:08:23 +00:00
|
|
|
|
call it ``X``, though. One indicates that the animal and effect go together,
|
2021-08-29 00:25:17 +00:00
|
|
|
|
and zero means they don't. For our record, we could write a table to go with
|
2021-08-29 13:08:23 +00:00
|
|
|
|
``X``, and it would look like this:
|
|
|
|
|
|
|
|
|
|
| Animal | mean | 1990 | 1991 | 1992 | male | female |
|
|
|
|
|
|:------ |:---- |:---- |:---- |:---- |:---- |:------ |
|
|
|
|
|
| 1 | yes | yes | no | no | yes | no |
|
|
|
|
|
| 2 | yes | yes | no | no | no | yes |
|
|
|
|
|
| 3 | yes | no | yes | no | yes | no |
|
|
|
|
|
| 4 | yes | no | yes | no | no | yes |
|
|
|
|
|
| 5 | yes | no | yes | no | yes | no |
|
|
|
|
|
| 6 | yes | no | yes | no | no | yes |
|
|
|
|
|
| 7 | yes | no | no | yes | yes | no |
|
|
|
|
|
|
|
|
|
|
Now that we have ``X``, we have the ability to start making changes to allow
|
|
|
|
|
us to solve for ``u``. Immediately, we see that ``X`` is **singular**, meaning
|
2021-08-29 00:25:17 +00:00
|
|
|
|
it can't be solved directly. We kind of already knew that, but now we can
|
2021-08-29 13:08:23 +00:00
|
|
|
|
quantify it. We calculate the
|
|
|
|
|
[rank of ``X``](https://math.stackexchange.com/a/2080577),
|
|
|
|
|
and find that there is only enough information contained in it to solve for 4
|
|
|
|
|
variables, which means we need to eliminate two columns.
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
There are several ways to effectively eliminate fixed effects in this type of
|
|
|
|
|
system, but one of the simplest and the most common methods is to declare a
|
|
|
|
|
**base population**, and lump the fixed effects of animals within the base
|
|
|
|
|
population into the mean fixed effect. Note that it is possible to declare a
|
|
|
|
|
base population that has no animals in it, but that gives weird results. For
|
|
|
|
|
this example, we'll follow the convention built into `beefblup` and pick the
|
|
|
|
|
last occuring form of each variable.
|
|
|
|
|
|
|
|
|
|
### Base population
|
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
- **Year**: 1992
|
|
|
|
|
- **Sex**: Female
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
Now in order to use the base population, we simply drop the columns representing
|
2021-08-29 13:08:23 +00:00
|
|
|
|
conformity with the traits in the base population from ``X````. Our new
|
2021-08-29 00:25:17 +00:00
|
|
|
|
equation looks like
|
|
|
|
|
|
|
|
|
|
```math
|
|
|
|
|
\begin{bmatrix}
|
2021-08-29 13:08:23 +00:00
|
|
|
|
354 \ \textup{kg} \\
|
|
|
|
|
251 \ \textup{kg} \\
|
|
|
|
|
327 \ \textup{kg} \\
|
|
|
|
|
328 \ \textup{kg} \\
|
|
|
|
|
301 \ \textup{kg} \\
|
|
|
|
|
270 \ \textup{kg} \\
|
|
|
|
|
330 \ \textup{kg}
|
2021-08-29 00:25:17 +00:00
|
|
|
|
\end{bmatrix}
|
|
|
|
|
=
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
u_1 \\
|
|
|
|
|
u_2 \\
|
|
|
|
|
u_3 \\
|
|
|
|
|
u_4 \\
|
|
|
|
|
u_5 \\
|
|
|
|
|
u_6 \\
|
|
|
|
|
u_7
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
+
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
1 & 1 & 0 1 \\
|
|
|
|
|
1 & 1 & 0 0 \\
|
|
|
|
|
1 & 0 & 1 1 \\
|
|
|
|
|
1 & 0 & 1 0 \\
|
|
|
|
|
1 & 0 & 1 1 \\
|
|
|
|
|
1 & 0 & 0 1
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
+
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
b_{mean} \\
|
|
|
|
|
b_{1990} \\
|
|
|
|
|
b_{1991} \\
|
|
|
|
|
b_{male} \\
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
+
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
e_1 \\
|
|
|
|
|
e_2 \\
|
|
|
|
|
e_3 \\
|
|
|
|
|
e_4 \\
|
|
|
|
|
e_5 \\
|
|
|
|
|
e_6 \\
|
|
|
|
|
e_7
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
And the table for humans to understand:
|
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
| Animal | mean | 1990 | 1991 | female |
|
|
|
|
|
|:------ |:---- |:---- |:---- |:------ |
|
|
|
|
|
| 1 | yes | yes | no | no |
|
|
|
|
|
| 2 | yes | yes | no | yes |
|
|
|
|
|
| 3 | yes | no | yes | no |
|
|
|
|
|
| 4 | yes | no | yes | yes |
|
|
|
|
|
| 5 | yes | no | yes | no |
|
|
|
|
|
| 6 | yes | no | yes | yes |
|
|
|
|
|
| 7 | yes | no | no | no |
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
Even though each animal is said to participate in the mean, the result for the
|
|
|
|
|
mean will now actually be the average of the base population. Math is weird
|
|
|
|
|
sometimes.
|
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
Double-checking, the rank of ``X`` is still 4, so we can solve for the average
|
2021-08-29 00:25:17 +00:00
|
|
|
|
of the base population, and the effect of being born in 1990, the effect of
|
2021-08-29 13:08:23 +00:00
|
|
|
|
being born in 1991, and the effect of being male.
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
Whew! That was some transformation. We still haven't constrained this model
|
|
|
|
|
enough to solve it, though. Now on to the genotype.
|
|
|
|
|
|
|
|
|
|
## The statistical model: genotype as random effect
|
|
|
|
|
|
|
|
|
|
Remember I said above that genotype was a **random effect**? Statisticians say
|
|
|
|
|
"_a random effect is an effect that influences the variance and not the mean of
|
|
|
|
|
the observation in question._" I'm not sure exactly what that means or how that
|
|
|
|
|
is applicable to genotype, but it does let us add an additional constraint to
|
|
|
|
|
our model.
|
|
|
|
|
|
|
|
|
|
The basic gist of genetics is that organisms that are related to one another are
|
|
|
|
|
similar to one another. Based on a pedigree, we can even say how related to one
|
|
|
|
|
another animals are, and quantify that as the amount that the genotype terms
|
|
|
|
|
should be allowed to vary between related animals.
|
|
|
|
|
|
|
|
|
|
We'll need a pedigree for our animals:
|
|
|
|
|
|
|
|
|
|
### Calf Records
|
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
| ID | Sire | Dam | Birth Year | Sex | YW (kg) |
|
|
|
|
|
|:-- |:---- |:--- |:---------- |:------ |:------- |
|
|
|
|
|
| 1 | NA | NA | 1990 | Male | 354 |
|
|
|
|
|
| 2 | NA | NA | 1990 | Female | 251 |
|
|
|
|
|
| 3 | 1 | NA | 1991 | Male | 327 |
|
|
|
|
|
| 4 | 1 | NA | 1991 | Female | 328 |
|
|
|
|
|
| 5 | 1 | 2 | 1991 | Male | 301 |
|
|
|
|
|
| 6 | NA | 2 | 1991 | Female | 270 |
|
|
|
|
|
| 7 | NA | NA | 1992 | Male | 330 |
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
Now, because cows sexually reproduce, the genotype of one animal is halfway the
|
2021-08-29 13:08:23 +00:00
|
|
|
|
same as that of either parent (exception: inbreeding, see below). It should go
|
|
|
|
|
without saying that each animal's genotype is identical to that of itself. From
|
|
|
|
|
this we can then find the numerical multiplier for any relative (grandparent =
|
|
|
|
|
1/4, full sibling = 1, half sibling = 1/2, etc.). Let's write those values down
|
|
|
|
|
in a table.
|
|
|
|
|
|
|
|
|
|
| ID | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|
|
|
|
|
|:-- |:--- |:--- |:--- |:--- |:--- |:--- |:-- |
|
|
|
|
|
| 1 | 1 | 0 | 1/2 | 1/2 | 1/2 | 0 | 0 |
|
|
|
|
|
| 2 | 0 | 1 | 0 | 0 | 1/2 | 1/2 | 0 |
|
|
|
|
|
| 3 | 1/2 | 0 | 1 | 1/4 | 1/4 | 0 | 0 |
|
|
|
|
|
| 4 | 1/2 | 0 | 1/4 | 1 | 1/4 | 0 | 0 |
|
|
|
|
|
| 5 | 1/2 | 1/2 | 1/4 | 1/4 | 1 | 1/4 | 0 |
|
|
|
|
|
| 6 | 0 | 1/2 | 0 | 0 | 1/4 | 1 | 0 |
|
|
|
|
|
| 7 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
Hmm. All those numbers look suspiciously like a matrix. Why don't I put them
|
2021-08-29 13:08:23 +00:00
|
|
|
|
into a matrix called ``A``?
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
```math
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
1 & 0 & \frac{1}{2} & \frac{1}{2} & \frac{1}{2} & 0 & 0 \\
|
|
|
|
|
0 & 1 & 0 & 0 & \frac{1}{2} & \frac{1}{2} & 0 \\
|
|
|
|
|
\frac{1}{2} & 0 & 1 & \frac{1}{4} & \frac{1}{4} & 0 & 0 \\
|
|
|
|
|
\frac{1}{2} & 0 & \frac{1}{4} & 1 & \frac{1}{4} & 0 & 0 \\
|
|
|
|
|
\frac{1}{2} & \frac{1}{2} & \frac{1}{4} & \frac{1}{4} & 1 & \frac{1}{4} & 0 \\
|
|
|
|
|
0 & \frac{1}{2} & 0 & 0 & \frac{1}{4} & 1 & 0 \\
|
|
|
|
|
0 & 0 & 0 & 0 & 0 & 0 & 1
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
```
|
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
Now I'm going to take the matrix with all of the ``u`` values, and call it
|
|
|
|
|
``μ``. To quantify the idea of genetic relationship, I will then say that
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
```math
|
2021-08-29 13:08:23 +00:00
|
|
|
|
\textup{var}(μ) = A σ_μ^2
|
2021-08-29 00:25:17 +00:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Where:
|
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
- ``A`` = the relationship matrix defined above
|
|
|
|
|
- ``σ_μ^2`` = the standard deviation of all the genotypes
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
To fully constrain the system, I have to make two more assumptions: 1) that the
|
|
|
|
|
error term in each animal's equation is independent from all other error terms,
|
|
|
|
|
and 2) that the error term for each animal is independent from the value of the
|
2021-08-29 13:08:23 +00:00
|
|
|
|
genotype. I will call the matrix holding the ``e`` values ``ε`` and then say
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
```math
|
2021-08-29 13:08:23 +00:00
|
|
|
|
\textup{var}(ϵ) = I σ_ϵ^2
|
2021-08-29 00:25:17 +00:00
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
```math
|
|
|
|
|
\textup{cov}(μ, ϵ) = \textup{cov}(ϵ, μ) = 0
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Substituting in the matrix names, our equation now looks like
|
|
|
|
|
|
|
|
|
|
```math
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
354 \textup{kg} \\
|
|
|
|
|
251 \textup{kg} \\
|
|
|
|
|
327 \textup{kg} \\
|
|
|
|
|
328 \textup{kg} \\
|
|
|
|
|
301 \textup{kg} \\
|
|
|
|
|
270 \textup{kg} \\
|
|
|
|
|
330 \textup{kg}
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
= μ + X
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
b_{mean} \\
|
|
|
|
|
b_{1990} \\
|
|
|
|
|
b_{1991} \\
|
|
|
|
|
b_{male} \\
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
+ ϵ
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
We are going to make three changes to this equation before we are ready to solve
|
|
|
|
|
it, but they are cosmetic details for this example.
|
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
1. Call the matrix on the left side of the equation ``Y`` (sometimes it's
|
2021-08-29 00:25:17 +00:00
|
|
|
|
called the **matrix of observations**)
|
2021-08-29 13:08:23 +00:00
|
|
|
|
2. Multiply ``μ`` by an identity matrix called ``Z``. Multiplying by the
|
2021-08-29 00:25:17 +00:00
|
|
|
|
identity matrix is the matrix form of multiplying by one, so nothing changes,
|
|
|
|
|
but if we later want to find one animal's genetic effect on another animal's
|
2021-08-29 13:08:23 +00:00
|
|
|
|
performance (e.g. a **maternal effects model**), we can alter ``Z`` to
|
|
|
|
|
allow that
|
|
|
|
|
3. Call the matrix with all the ``b`` values ``β``.
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
With all these changes, we now have
|
|
|
|
|
|
|
|
|
|
```math
|
|
|
|
|
Y = Z μ + X β + ϵ
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This is the canonical form of the mixed-model equation, and the form that
|
|
|
|
|
Charles Henderson used to first predict breeding values of livestock.
|
|
|
|
|
|
|
|
|
|
## Solving the equations
|
|
|
|
|
|
|
|
|
|
Henderson proved that the mixed-model equation can be solved by the following:
|
|
|
|
|
|
|
|
|
|
```math
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
\hat{β} \\
|
|
|
|
|
\hat{μ}
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
=
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
X'X & X'Z \\
|
2021-08-29 13:08:23 +00:00
|
|
|
|
Z'X & Z'Z+A^{-1}λ
|
2021-08-29 00:25:17 +00:00
|
|
|
|
\end{bmatrix}^{-1}
|
|
|
|
|
\begin{bmatrix}
|
|
|
|
|
X'Y \\
|
|
|
|
|
Z'Y
|
|
|
|
|
\end{bmatrix}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Where
|
|
|
|
|
|
|
|
|
|
- The variables with hats are the statistical estimates of their mixed-model
|
|
|
|
|
counterparts
|
2021-08-29 13:08:23 +00:00
|
|
|
|
- The predicted value of ``μ`` is called the _Best Linear Unbiased
|
2021-08-29 00:25:17 +00:00
|
|
|
|
Predictor_ or _BLUP_
|
2021-08-29 13:08:23 +00:00
|
|
|
|
- The estimated value of ``β`` is called the _Best Linear Unbiased Estimate_
|
2021-08-29 00:25:17 +00:00
|
|
|
|
or _BLUE_
|
|
|
|
|
- ' is the transpose operator
|
2021-08-29 13:08:23 +00:00
|
|
|
|
- ``λ`` is a single real number that is a function of the heritability for the trait
|
|
|
|
|
being predicted. It can be left out in many cases (``λ = 1``).
|
|
|
|
|
- ``λ = \frac{1-h^2}{h^2}``
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
What happened to
|
|
|
|
|
|
|
|
|
|
## Footnotes
|
|
|
|
|
|
2021-08-29 13:08:23 +00:00
|
|
|
|
### Exception
|
2021-08-29 00:25:17 +00:00
|
|
|
|
|
|
|
|
|
An animal **can** share its genome with itself by a factor of more than one:
|
|
|
|
|
that's called inbreeding! We can account for this, and `beefblup` does as it
|
2021-08-29 13:08:23 +00:00
|
|
|
|
calculates ``A``. This is an area that actually merits a good deal of study:
|
2021-08-29 00:25:17 +00:00
|
|
|
|
see chapter 2 of _Linear Models for the Prediction of Animal Breeding Values_ by
|
|
|
|
|
Raphael A. Mrode (ISBN 978 1 78064 391 5).
|