πŸ“¦ kurokobo / awx-on-k3s

πŸ“„ upgrade-operator.md Β· 290 lines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290<!-- omit in toc -->
# Upgrade AWX Operator and AWX

This guide provides the procedure for the following three types of upgrading AWX Operator.

- Upgrade from `0.14.0` or later (e.g. from `0.14.0` to `0.15.0`)
- Upgrade from `0.13.0` (e.g. from `0.13.0` to `0.14.0`)
- Upgrade from `0.12.0` or earlier (e.g. from `0.12.0` to `0.13.0`)

Note that once you upgrade AWX Operator, your AWX will also be upgraded automatically to the version bundled with the upgraded AWX Operator as shown in [πŸ“Version Mapping for AWX Operator and AWX](version-mapping.md).

[There is `image_version` parameter for AWX resource to change which image will be used](https://ansible.readthedocs.io/projects/awx-operator/en/latest/user-guide/advanced-configuration/deploying-a-specific-version-of-awx.html), but it appears that using a version of AWX other than the one bundled with the AWX Operator [is currently not supported](https://ansible.readthedocs.io/projects/awx-operator/en/latest/user-guide/advanced-configuration/deploying-a-specific-version-of-awx.html). Conversely, if you want to upgrade AWX, you need to plan to upgrade AWX Operator first.

<!-- omit in toc -->
## Table of Contents

- [βœ… Take a backup of the old AWX instance](#-take-a-backup-of-the-old-awx-instance)
- [πŸ“ Upgrade from `0.14.0` or later (e.g. from `0.14.0` to `0.15.0`)](#-upgrade-from-0140-or-later-eg-from-0140-to-0150)
  - [πŸ“ Procedure](#-procedure)
- [πŸ“ Upgrade from `0.13.0` (e.g. from `0.13.0` to `0.14.0`)](#-upgrade-from-0130-eg-from-0130-to-0140)
  - [πŸ“ Procedure](#-procedure-1)
- [πŸ“ Upgrade from `0.12.0` or earlier (e.g. from `0.12.0` to `0.13.0`)](#-upgrade-from-0120-or-earlier-eg-from-0120-to-0130)
- [❓ Troubleshooting](#-troubleshooting)
  - [New Pod gets stuck in `Pending` state](#new-pod-gets-stuck-in-pending-state)

## βœ… Take a backup of the old AWX instance

Before performing the upgrade, make sure that you have a backup of your old AWX.

Refer [πŸ“README: Backing up using AWX Operator](../README.md#backing-up-using-awx-operator) to take backup using AWX Operator.

## πŸ“ Upgrade from `0.14.0` or later (e.g. from `0.14.0` to `0.15.0`)

If you are using AWX Operator `0.14.0` or later and want to upgrade to newer version, basically upgrade is done by deploying the new version of AWX Operator to the same namespace where the old AWX Operator is running.

> [!WARNING]
> If you are planning to upgrade AWX Operator **from `2.12.2` or earlier to `2.13.1` or later**, note that since the bundled PostgreSQL version will be changed to 15, so the following additional tasks are required.
>
> ```bash
> # Required only when upgrading from 2.12.2 or earlier to 2.13.1 or later
> sudo mkdir -p /data/postgres-15/data
> sudo chmod 700 /data/postgres-15/data
> sudo chown 26:0 /data/postgres-15/data
> cat <<EOF > pv-postgres-15.yaml
> ---
> apiVersion: v1
> kind: PersistentVolume
> metadata:
>   name: awx-postgres-15-volume
> spec:
>   accessModes:
>     - ReadWriteOnce
>   persistentVolumeReclaimPolicy: Retain
>   capacity:
>     storage: 8Gi
>   storageClassName: awx-postgres-volume
>   hostPath:
>     path: /data/postgres-15
> EOF
> kubectl apply -f pv-postgres-15.yaml
> ```

> [!WARNING]
> If you are planning to upgrade AWX Operator **from `2.0.0` to `2.0.1` or later**, note that [the `extra_volumes` and `extra_volumes` in `base/awx.yaml` for `2.0.0` as a workaround for specific issue](https://github.com/kurokobo/awx-on-k3s/blob/2.0.0/base/awx.yaml#L42-L51) causes failure of upgrading.
>
> To avoid this, follow these steps before upgrading AWX Operator. Steps 1 and 2 can also be achieved by `kubectl -n awx edit awx awx`.
>
> 1. Remove the definition of the volume and volume mount that named `awx-projects-web` in `extra_volumes` and `web_extra_volume_mounts` in your `base/awx.yaml`.
>    - If there are no other volumes or volume mounts, you can remove whole `extra_volumes` and `web_extra_volume_mounts`.
> 2. Apply modified `base/awx.yaml` by `kubectl apply -k base`
> 3. Wait for deployment for AWX to be completed
>
> Once your AWX has been deployed without volume `awx-projects-web`, your AWX can be safely upgraded. Proceed to [the next step](#-procedure).

> [!WARNING]
> If you are planning to upgrade AWX Operator **from `0.25.0` or earlier to `0.26.0` or later**, note that since the bundled PostgreSQL version will be changed to 13, so the following additional tasks are required.
>
> ```bash
> # Required only when upgrading from 0.25.0 or earlier to 0.26.0 or later
> sudo mkdir -p /data/postgres-13
> sudo chmod 700 /data/postgres-13
> cat <<EOF > pv-postgres-13.yaml
> ---
> apiVersion: v1
> kind: PersistentVolume
> metadata:
>   name: awx-postgres-13-volume
> spec:
>   accessModes:
>     - ReadWriteOnce
>   persistentVolumeReclaimPolicy: Retain
>   capacity:
>     storage: 8Gi
>   storageClassName: awx-postgres-volume
>   hostPath:
>     path: /data/postgres-13
> EOF
> kubectl apply -f pv-postgres-13.yaml
> ```

### πŸ“ Procedure

To upgrade your AWX Operator, perform following steps. If your target version is earlier than 2.3.0, switch this page to the desired tag instead of the `main` branch.

```bash
# Prepare required files
cd ~
git clone https://github.com/kurokobo/awx-on-k3s.git
cd awx-on-k3s
git checkout 2.3.0  # Checkout the version to upgrade to

# Deploy AWX Operator
kubectl apply -k operator
```

This will upgrade the AWX Operator first, after that, AWX will be also upgraded as well.

To monitor the progress of the deployment, check the logs of `deployments/awx-operator-controller-manager`:

```bash
kubectl -n awx logs -f deployments/awx-operator-controller-manager -c awx-manager
```

When the deployment completes successfully, the logs end with:

```txt
$ kubectl -n awx logs -f deployments/awx-operator-controller-manager -c awx-manager
...
----- Ansible Task Status Event StdOut (awx.ansible.com/v1beta1, Kind=AWX, awx/awx) -----
PLAY RECAP *********************************************************************
localhost                  : ok=56   changed=0    unreachable=0    failed=0    skipped=35   rescued=0    ignored=0
```

> [!NOTE]
> If your AWX Operator has upgraded from `2.12.2` or earlier to `2.13.1` or later, old PV for PostgreSQL 13 can be removed since new AWX is running with new PV for PostgreSQL 15.
>
> ```bash
> # Recommended only when upgraded from 2.12.2 or earlier to 2.13.1 or later
> kubectl delete pv awx-postgres-13-volume
> sudo rm -rf /data/postgres-13
> ```

> [!NOTE]
> If your AWX Operator has upgraded from `0.25.0` or earlier to `0.26.0` or later, old PV for PostgreSQL 12 can be removed since new AWX is running with new PV for PostgreSQL 13.
>
> ```bash
> # Recommended only when upgraded from 0.25.0 or earlier to 0.26.0 or later
> kubectl delete pv awx-postgres-volume
> sudo rm -rf /data/postgres
> ```

## πŸ“ Upgrade from `0.13.0` (e.g. from `0.13.0` to `0.14.0`)

If you are using AWX Operator `0.13.0` and want to upgrade to newer version, you should consider the big changes in AWX Operator in `0.14.0`. [As described in the documentation](https://ansible.readthedocs.io/projects/awx-operator/en/latest/upgrade/upgrading.html#v0140), in `0.14.0`, AWX Operator changed from cluster scope to namespace scope. Also, the Operator SDK `1.x` is used.

This means that upgrading from `0.13.0` to `0.14.0` or later requires a bit of finesse, such as cleaning the old AWX Operator. **If you are using `0.12.0` or earlier and want to upgrade to `0.14.0` or later, I recommend you to [upgrade to `0.13.0` first](#-upgrade-from-0120-or-earlier-eg-from-0120-to-0130) and then come back to here to avoid unintended issue.**

In this guide, for example, perform upgrading from `0.13.0` to `0.14.0`. The AWX Operator `0.13.0` or earlier resides in the `default` namespace by default and the related AWX instance resides in the `awx` namespace, as described in this repository. After the upgrade, everything related to the AWX Operator `0.14.0` will reside in the `awx` namespace.

| Phase            | AWX Operator                    | AWX Instance                |
| ---------------- | ------------------------------- | --------------------------- |
| Before Upgrade   | `0.13.0` in `default` namespace | `19.3.0` in `awx` namespace |
| After Upgrade    | `0.14.0` in `awx` namespace     | `19.4.0` in `awx` namespace |

### πŸ“ Procedure

To upgrade AWX Operator, remove the old AWX Operator that is running in the `default` namespace first. In addition, remove Service Account, Cluster Role, and Cluster Role Binding that are required for old AWX Operator to work.

```bash
kubectl -n default delete deployment awx-operator
kubectl -n default delete serviceaccount awx-operator
kubectl -n default delete clusterrolebinding awx-operator
kubectl -n default delete clusterrole awx-operator
```

Since we removed only old AWX Operator, the old CRDs are still exist. Therefore, the old `awx` resource which means old AWX instance is still running in the `awx` namespace.

Finally, deploy the new AWX Operator to the `awx` namespace.

```bash
# Prepare required files
cd ~
git clone https://github.com/ansible/awx-operator.git
cd awx-operator
git checkout 0.14.0  # Checkout the version to upgrade to

# Deploy AWX Operator
export NAMESPACE=awx  # Specify the namespace where the old AWX instance exists
make deploy
```

This will update the CRDs in the cluster and create the required Service Account, Roles, etc. in the `awx` namespace. Also, AWX Operator will start working. Once AWX Operator is up and running, it will start rolling out a new version of the AWX instance automatically based on the old `awx` resource definition.

To monitor the progress of the deployment, check the logs of `deployments/awx-operator-controller-manager`:

```bash
kubectl -n awx logs -f deployments/awx-operator-controller-manager -c awx-manager
```

When the deployment completes successfully, the logs end with:

```txt
$ kubectl -n awx logs -f deployments/awx-operator-controller-manager -c awx-manager
...
----- Ansible Task Status Event StdOut (awx.ansible.com/v1beta1, Kind=AWX, awx/awx) -----
PLAY RECAP *********************************************************************
localhost                  : ok=56   changed=0    unreachable=0    failed=0    skipped=35   rescued=0    ignored=0
```

## πŸ“ Upgrade from `0.12.0` or earlier (e.g. from `0.12.0` to `0.13.0`)

If you are using `0.12.0` or earlier and want to upgrade to newer version, simply, deploy the new version of AWX Operator. This procedure can be applicable for upgrading to up to `0.13.0`. **If you want to upgrade to `0.14.0` or later, I recommend you to upgrade to `0.13.0` by following this procedure first and then [perform upgrading to `0.14.0` or later](#-upgrade-from-0130-eg-from-0130-to-0140).**

```bash
# Specify the version to upgrade to in the URL
kubectl apply -f https://raw.githubusercontent.com/ansible/awx-operator/0.13.0/deploy/awx-operator.yaml
```

This will upgrade the AWX Operator first, after that, AWX will be also upgraded as well.

To monitor the progress of the deployment, check the logs of `deployment/awx-operator`:

```bash
kubectl logs -f deployment/awx-operator
```

When the deployment completes successfully, the logs end with:

```txt
$ kubectl logs -f deployment/awx-operator
...
--------------------------- Ansible Task Status Event StdOut  -----------------
PLAY RECAP *********************************************************************
localhost                  : ok=54   changed=0    unreachable=0    failed=0    skipped=37   rescued=0    ignored=0 
-------------------------------------------------------------------------------
```

## ❓ Troubleshooting

Some hints for when you got stuck during upgrade.

### New Pod gets stuck in `Pending` state

If the K3s node does not have enough free resources to deploy a new AWX instance, the new Pod for AWX gets stuck in `Pending` state.

```bash
$ kubectl -n awx get pods
NAME                                               READY   STATUS    RESTARTS   AGE
awx-7d74496d7d-d66dw                               4/4     Running   0          19d
awx-84d5c45999-55gb4                               0/4     Pending   0          10s     πŸ‘ˆπŸ‘ˆπŸ‘ˆ
```

Try running `kubectl -n awx describe pod <Pod Name>` and check the `Events` section at the end for the cause.

```bash
$ kubectl -n awx describe pod awx-84d5c45999-55gb4
...
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  106s  default-scheduler  0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory.     πŸ‘ˆπŸ‘ˆπŸ‘ˆ
  Warning  FailedScheduling  105s  default-scheduler  0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory.     πŸ‘ˆπŸ‘ˆπŸ‘ˆ
```

This means that the node does not have enough CPU or memory resources to start the Pod.

During the AWX upgrade, a rollout of the Deployment resource will be performed and temporarily two AWX Pods will be running. This means that the required Resource Requests for CPU and memory will be doubled.

For this reason, if we do not have enough free resources on our K3s node, we can manually delete the old AWX instance beforehand in order to free up resources. Note that the rollout history will be lost with this step.

```bash
$ kubectl -n awx delete deployment awx
deployment.apps "awx" deleted

$ kubectl -n awx get all
NAME                    READY   STATUS    RESTARTS   AGE
pod/awx-postgres-13-0   1/1     Running   0          8m57s

NAME                   TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
service/awx-postgres   ClusterIP   None            <none>        5432/TCP   8m57s
service/awx-service    ClusterIP   10.43.248.150   <none>        80/TCP     8m51s

NAME                            READY   AGE
statefulset.apps/awx-postgres   1/1     8m58s
```

Ensure that it is not the `awx` resource that should be deleted, but the `deployment` resource. If we accidentally delete the `awx` resource or any Secrets, we will not be able to upgrade successfully.

After a few minutes of waiting, our AWX Operator will successfully launch the new Deployment and the Pod for AWX.