Debian comes with PostgreSQL Global Development Group (PGDG) APT repo activation script in postgresql-common
package.
apt install gnupg postgresql-common apt-transport-https lsb-release wget
sh /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh
At this point it is probably good idea to set pg_createcluster
default options to not set locale.
mkdir /etc/postgresql-common/createcluster.d/
echo "initdb_options = '--encoding=UTF8 --locale=C'" > /etc/postgresql-common/createcluster.d/no-locale.conf
Optionally you can also disable automatic main cluster creation
echo "create_main_cluster = false" > /etc/postgresql-common/createcluster.d/no-main-cluster.conf
And then install latest PostgreSQL
sudo apt install postgresql
Always set locale to
--locale=C
Encoding should always be--encoding=UTF8
Using
en_US.UTF-8
or some other real locale instead ofC
will change string sorting order, do some other "fun things" and might break your indexes when next glibc update hits.
PostgreSQL 17 has
C.UTF-8
that has some better string handling semantics than plainC
locale.
https://pganalyze.com/blog/5mins-postgres-17-builtin-c-utf8-locale
https://www.pgevents.ca/events/pgconfdev2024/schedule/session/95-collations-from-a-to-z/
Under Debian/Ubuntu you should use pg_createcluster command that looks something like this
First stop and drop existing cluster created by apt
sudo pg_ctlcluster 16 main stop
sudo pg_dropcluster 16 main
And then
pg_createcluster 16 main -- --encoding=UTF8 --locale=C
Normal initdb command should look something like this
initdb -D /var/lib/postgres/data --encoding=UTF8 --locale=C
To set predefined auth method similar to pg_createcluster
initdb -D /var/lib/postgres/data --encoding=UTF8 --locale=C --auth-host=scram-sha-256 --auth-local=peer --pwprompt
Authentication options are configured in pg_hba.conf
pg_createcluster
will set default unix domain socket (local) authentication type to peer and tcp/ip host auth to scram-sha-256 password
On other systems you can set those authentication parameters with following initdb arguments. This will also ask you for a password to give to the DB super-user.
--auth-host=scram-sha-256 --auth-local=peer --pwprompt
Legacy systems might require md5
auth instead of scram-sha256
. https://info.crunchydata.com/blog/how-to-upgrade-postgresql-passwords-to-scram
Add this line to pg_hba.conf
and reload postgres
# TYPE DATABASE USER ADDRESS METHOD
host all all samenet scram-sha256
More notes are documented here https://vadosware.io/post/everything-ive-seen-on-optimizing-postgres-on-zfs-on-linux/
Basic ZFS dataset options
atime = off # Recoding access time on file open is stupid
relatime = on # writing access time on file write is fine because we are writing anyway.
compression = lz4 # Its faster to compress/decompress on the CPU than it is to wait for the data from HDD/SSD
recordsize = 8K # postgres native 8k will give give horrible compression ratio, default 128k also works
primarycache = metadata # maybe?
Each postgres database cluster gets its own dataset, also WAL should be on its own seperate dataset for easy snapshotting but need to check that.
Config generator https://pgtune.leopard.in.ua/#/
And also for ZFS
full_page_writes = false
This disables double writes for data because ZFS is always consistent, ~2x insert perf boost
NEVER set this for non Copy on Write filesystems (ext4, xfs) as it will cause corruption on power loss due to page-tearing
First, log in into postgres container by using
machinectl shell postgres@db
(Ensure that you are user postgres
)
pwgen 16
createuser --echo --pwprompt user
createdb --echo --owner=user name
(Ensure that you are user postgres
or other superuser)
CREATE DATABASE database;
CREATE USER user WITH ENCRYPTED PASSWORD 'password';
GRANT ALL PRIVILEGES ON DATABASE database TO user;
You can wrap pass this to psql -v ON_ERROR_STOP=1
in a pipe as well
Use pg_dump
pg_dump -Fc synapse > synapse-$(date -I).dump
Where -Fc
is for flexible custom format and synapse
is the database you want to backup.
NB: by default
pg_dump
uses unix sockets and peer auth
as such it tries to auth with your current user.
Longer form that uses tcp for connection
pg_dump --host localhost --username synapse --password -Fc synapse > synapse-$(date -I).dump
Use pg_restore
Short version using current user and unix socket
pg_restore --dbname synapse < synapse.dump
Longer version using tcp
pg_restore --host localhost --username synapse --password --dbname synapse < synapse.dump
psql quassel -c "select pg_size_pretty( pg_database_size('quassel') );"
psql
pager not wrap linesLESS=S psql
psql
pagerIn ~/.psqlrc
add this
\pset pager off
psql
output from horisontal to vertical output\x
Useful for local administration
createuser --pwprompt --superuser arti
Or
CREATE USER arti SUPERUSER ENCRYPTED PASSWORD 'salakala';
In psql
you can list current permissions with \dp
and default permissions with \ddp
If your tables are not in the default public
schema then you also
have to give permission to access your schema
grant usage
on schema
my_schema
to
my_ro_user;
Default public
schema should already have this usage grant for all users
Default privileges are applyed based on who creates the tables in the database.
alter default privileges
for role
user_who_creates_the_tables
in schema
public
grant
select
on
tables
to
my_ro_user;
You can list current default privileges in psql
with \ddp
grant select on all tables to my_ro_user;
libc
locale upgradehttps://wiki.postgresql.org/wiki/Locale_data_changes
For each database run
REINDEX DATABASE CONCURRENTLY ____;
ALTER DATABASE ____ REFRESH COLLATION VERSION;
Pretty doable option:
PostgreSQL built in text search is fine for < 100k documents. After ~100k up to few million docs you need to optimise your queries harder. Its built in, quite easy to use and light weight so good idea to start with builtin search.
Use CTE-s to do initial document filtering before ranking, Filtering based on indexes is cheap, ranking is expensive. Maybe use several layers of CTE based filtering with more expensive ranking on each step.
After 100k documents monitor query times and start planing for PostgreSQL + ElasticSearch/Meilisearch hybrid architecture.