Admin tutorial

This page is a guided walk-thru of common administration activities and features:

  • View health of VMs
  • Access PostgreSQL servers inside Docker containers
  • Demonstrate high availability of Dingo PostgreSQL cluster
  • Destroy all VMs and recover from archives
  • Directly connect to etcd database
  • Directly interact with broker API
  • Locate the cell running leader node of a cluster
  • Scale up and out to create more capacity
  • Move a database to different VMs
  • Find syslog logs various scenarios

The section “Destroy all VMs and recover” is a very important section. It is highly recommended that all administrators practise this tutorial at least once so they are familiar and confident with the disaster recovery proceedings.

Assumptions

It is assumed that you are continuing this tutorial after doing User tutorial and that there is a single music-db service instance running.

If not, then:

cf create-service dingo-postgresql cluster music-db
cf service music-db

Re-run cf service music-db until the Dingo PostgreSQL database cluster has successfully started running.

View health of VMs

As an administrator of Pivotal OpsManager, you can SSH into its VM and access the running Dingo PostgreSQL using the bosh CLI.

$ ssh [email protected]
Password: ***********

Once inside, target the BOSH director and login:

bosh --ca-cert /var/tempest/workspaces/default/root_ca_certificate target OPS-MANAGER-DIRECTOR-IP
bosh login

To find the Dingo PostgreSQL deployment name:

bosh deployments

The output will look similar to:

+---------------------------------------+---------------------------------+-
| Name                                  | Release(s)                      |
+---------------------------------------+---------------------------------+-
| cf-cfe388ed8196d69ce46c               | cf/235.5.50                     |
|                                       | ...                             |
+---------------------------------------+---------------------------------+-
| dingo-postgresql-a7b288b79dc61da71d3c | broker-registrar/2.0.0          |
|                                       | dingo-postgresql/0.10.0         |
|                                       | etcd/66                         |
|                                       | simple-remote-syslog/2.2.1      |
+---------------------------------------+---------------------------------+-

To poll for the health/vitals of Dingo PostgreSQL using the name above (using a small font and wide terminal window):

bosh vms dingo-postgresql-a7b288b79dc61da71d3c --vitals

The output’s right hand side will show the utilization of RAM Memory Usage and persistent disk Persistent Disk Usage:

+--------------------------+---------+-------------+-----------------------+------+------+------+--------------+------------+------------+------------+------------+
| VM                       | State   | IPs         |         Load          | CPU  | CPU  | CPU  | Memory Usage | Swap Usage | System     | Ephemeral  | Persistent |
|                          |         |             | (avg01, avg05, avg15) | User | Sys  | Wait |              |            | Disk Usage | Disk Usage | Disk Usage |
+--------------------------+---------+-------------+-----------------------+------+------+------+--------------+------------+------------+------------+------------+
| cell_z1-partition-9ae... | running | 10.213.0.62 | 0.00, 0.03, 0.05      | 0.9% | 0.6% | 0.0% | 3% (463.0M)  | 0% (0B)    | 44%        | 0%         | 1%         |
| cell_z1-partition-9ae... | running | 10.213.0.63 | 0.06, 0.05, 0.05      | 1.0% | 0.5% | 0.0% | 3% (464.3M)  | 0% (0B)    | 44%        | 0%         | 1%         |
| cell_z2-partition-9ae... | running | 10.213.0.64 | 0.00, 0.02, 0.05      | 0.3% | 0.3% | 0.1% | 3% (448.5M)  | 0% (0B)    | 44%        | 0%         | 1%         |
| cell_z2-partition-9ae... | running | 10.213.0.65 | 0.00, 0.01, 0.05      | 0.2% | 0.2% | 0.0% | 3% (463.5M)  | 0% (0B)    | 44%        | 0%         | 1%         |
| etcd-partition-9ae6a1... | running | 10.213.0.61 | 0.05, 0.04, 0.05      | 0.0% | 0.0% | 0.3% | 5% (106.9M)  | 0% (0B)    | 44%        | 1%         | 3%         |
| router-partition-9ae6... | running | 10.213.0.66 | 0.03, 0.05, 0.05      | 0.5% | 0.6% | 0.0% | 6% (127.1M)  | 0% (0B)    | 44%        | 3%         | 0%         |
+--------------------------------------------+-----+----------------------------------------+-------------+-----------------------+------+------+------+--------------+------------+------------+------------+------------+

You are primarily focused on the health of the cell VMs. This is where PostgreSQL servers are running inside Docker containers.

Convenience command to look up the deployment name and put into $name variable:

name=$(bosh deployments | grep dingo-postgresql- | awk '{print $2}')
bosh vms $name

Access PostgreSQL servers inside Docker containers

Use the bosh CLI to SSH into one of the cell VMs:

bosh ssh cell_z1-partition-9ae6a1dc1415bea56862/0

Once inside, setup a _docker alias that references the local Docker daemon:

alias _docker="/var/vcap/packages/docker/bin/docker --host unix:///var/vcap/sys/run/docker/docker.sock"

To see the list of Docker containers running:

_docker ps

The example output below shows a single Docker container running a PostgreSQL server (one half of a highly available cluster):

CONTAINER ID        IMAGE                                  COMMAND                  CREATED             STATUS              PORTS                                              NAMES
19b6bc04a2fe        dingotiles/dingo-postgresql95:0.10.0   "dumb-init -c /script"   5 days ago          Up 5 days           0.0.0.0:32770->5432/tcp, 0.0.0.0:32771->8008/tcp   cf-51ac1400-3a73-452b-b531-db50f2084da1
88395eabc4c5        cfcommunity/registrator:latest         "/bin/registrator -ho"   5 days ago          Up 5 days                                                              registrator

The cfcommunity/registrator:latest container is an internal system component used by Dingo PostgreSQL for containers to discover their host information.

In the example above, the container 19b6bc04a2fe has name cf-51ac1400-3a73-452b-b531-db50f2084da1.

To open a shell inside the container, use either the name or ID for a container. The following are interchangeable:

_docker exec -ti 19b6bc04a2fe bash
_docker exec -ti cf-51ac1400-3a73-452b-b531-db50f2084da1 bash

Once inside the Docker container, to view a hierarchical list of processes running:

[email protected]:~# ps axwf
  PID  COMMAND
 3377  bash
 3405   \_ ps axwf
    1  dumb-init -c /scripts/run.sh
   10  sudo PATH=/usr/lib/postgresql/9.5/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin -E -u postgres /scripts/postgres
   73   \_ /bin/bash /scripts/postgres/start_pg.sh
   87       \_ python /patroni.py /patroni/postgres.yml
   88       \_ /bin/bash /scripts/postgres/start_pg.sh
   92       |   \_ sed -u s/^/T-3228-patroni> /
  100       \_ /bin/bash /scripts/postgres/reinitialize_when_stalled.sh
 3404       |   \_ sleep 2
  101       \_ /bin/bash /scripts/postgres/self_advertize.sh
 3372           \_ sleep 6
   94  /bin/bash /scripts/postgres/regular_backup.sh
 3219   \_ sleep 30
   95  /bin/bash /scripts/postgres/regular_backup.sh
  102   \_ sed -u s/^/T-3228-backup> /
   98  /bin/bash /scripts/postgres/regular_backup.sh
 3359   \_ sleep 5
   99  /bin/bash /scripts/postgres/regular_backup.sh
  107   \_ sed -u s/^/T-3228-backup> /
20913  /usr/lib/postgresql/9.5/bin/postgres -D /data/postgres0 --listen_addresses=0.0.0.0 --port=5432 --hot_standby=on --listen_addresses=
20964   \_ postgres: checkpointer process
20965   \_ postgres: writer process
20966   \_ postgres: stats collector process
20968   \_ postgres: postgres postgres 127.0.0.1(50826) idle
21018   \_ postgres: wal sender process appuser 10.213.0.64(50234) streaming 3/EE000060
21277   \_ postgres: wal writer process
21278   \_ postgres: autovacuum launcher process
21279   \_ postgres: archiver process   last was 0000000C00000003000000ED

And there at the end is the postgres process and its child processes. The Docker container does not directly invoke postgres, rather it is a system called patroni that is responsible for initializing, configuring, and reconfiguring PostgreSQL.

Demonstrate high availability of Dingo PostgreSQL cluster

Whilst on the same BOSH VM above, we can emulate the loss of the cell by stopping all the processes (including Docker containers).

Change to root user and ask monit to stop everything:

sudo su -
monit stop all
monit summary
ps axwf

All the docker containers running PostgreSQL have stopped on this VM (as well as all the subsystems).

If you refresh the application, it will eventually resume working after the replica container (on another VM) assumes the mantle of leader.

At anytime, you can ask the cluster’s leader for its list of replicas:

psql $uri -c "select * from pg_stat_replication;"

To restart the subsystems and Docker containers on the VM:

monit restart all

Eventually the Docker containers will recreate their PostgreSQL servers as healthy replicas. To confirm that the leader-cum-replica has rejoined its cluster:

psql $uri -c "select * from pg_stat_replication;"

Destroy all VMs and recover from archives

This is a very important section. It is highly recommended that all administrators practise this tutorial at least once so they are familiar and confident with the disaster recovery proceedings.

There will be down time (as every VM that is part of Dingo PostgreSQL will not be running nor in existence) but within a few minutes the system will be back running and all clusters will have been recreated from archives.

Destroy the BOSH deployment and all its VMs…

bosh login
bosh deployment /var/tempest/workspaces/default/deployments/dingo-postgresql-*.yml
name=$(bosh deployments | grep dingo-postgresql- | awk '{print $2}')
echo $name
bosh -n delete deployment $name

This will delete every VM and every persistent disk (technically the disks are orphaned).

At this moment, Cloud Foundry and user’s applications, still believe all the database service instances exist:

cf curl /v2/service_instances

To recreate the cluster and recover all expected service instances:

bosh -n deploy
bosh run errand disaster-recovery

This disaster-recovery errand will fetch the expected list of service instances from Pivotal Elastic Runtime/Cloud Foundry; and will recreate each one from its latest archive.

For each item found in cf curl /v2/service_instances, a new database cluster is created. The original credentials, node count and size are all retrieved from the clusterdata object store bucket.

As the database leader starts up, it recreates itself from the base backup and subsequent WAL segments.

Directly connect to etcd database

The backend for Dingo PostgreSQL is the etcd database. To determine an IP address to access it:

etcd_host=$(bosh vms $name | grep etcd | head -n 1 | awk '{print $11}')
curl ${etcd_host}:4001/v2/keys

Directly interact with broker API

Access the service broker API (which includes an admin API just for you) via port 8889 on a router VM.

First, look up the basic auth username/password from the deployment manifest:

grep servicebroker: -a4 /var/tempest/workspaces/default/deployments/dingo-postgresql-*.yml

Pull out the values and set into environment variables:

broker_username=<USERNAME>
broker_password=<PASSWORD>
broker_host=$(bosh vms $name | grep router | head -n 1 | awk '{print $11}')
broker_uri="http://${broker_username}:${broker_password}@${broker_host}:8889"
curl ${broker_uri}/v2/catalog

Locate the cell running leader node of a cluster

In addition to the standard Cloud Foundry Service Broker API (GET /v2/catalog, PUT /v2/service_instance/:id, etc), we have a growing admin API:

curl ${broker_uri}/admin/cells

id=$(cf service music-db --guid)
curl ${broker_uri}/admin/service_instances/$id | jq .

This JSON result will include admin credentials for the PostgreSQL database, the location of each node (a PostgreSQL server running in a container) and its role (leader or replica), and the current in-progress status:

{
  "nodes": [
    {
      "role": "master",
      "state": "running",
      "cell_guid": "10.213.0.62",
      "node_id": "1b98bc5a-d56c-4c8f-b89c-e4900d8bbc92"
    },
    {
      "role": "replica",
      "state": "running",
      "cell_guid": "10.213.0.65",
      "node_id": "fe477a10-b5d0-4e3b-92e3-1bdc77ab3787"
    }
  ],
  "service_instance_name": "",
  "info": {
    "last_message": "Scheduling Completed",
    "completed_steps": 4,
    "steps": 4,
    "status": "success"
  },
  "allocated_port": 30003,
  "instance_id": "badca97f-25c8-4d53-8c39-d541e0acf2e4",
  "service_id": "beb5973c-e1b2-11e5-a736-c7c0b526363d",
  "plan_id": "1545e30e-6dc3-11e5-826a-6c4008a663f0",
  "organization_guid": "46293a7b-22d4-475f-bcb4-c6fd07971ea9",
  "space_guid": "36ca64ea-f496-4458-bc1a-c426e125c763",
  "admin_credentials": {
    "password": "PASSWORD",
    "username": "pgadmin"
  },
  "superuser_credentials": {
    "password": "PASSWORD",
    "username": "postgres"
  },
  "app_credentials": {
    "password": "PASSWORD",
    "username": "appuser"
  }
}

This endpoint is very handy to determine which cell/VM is running the leader node. The following will return the IP address for the Cell VM that is running the leader node of a cluster:

curl ${broker_uri}/admin/service_instances/$id | jq -r ".nodes[] | select(.role == \"master\") | .cell_guid"

To look up the BOSH job name using this IP address:

cell=$(curl ${broker_uri}/admin/service_instances/$id | jq -r ".nodes[] | select(.role == \"master\") | .cell_guid")
bosh vms $name | grep $cell | awk '{print $2}'

This can then be passed to bosh ssh.

Scale up and out to create more capacity

Move a database to different VMs

Find syslog logs various scenarios