Django ORM

This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the python category.

Last Updated: 2024-10-12

Django ORM Scrap Book

Data Modeling

How to create a user model using the built-in tools?

from django.contrib.auth.models import AbstractUser

class User(AbstractUser):
    pass # remove this if you want to add custom code to extend the model

How to create a data migration?

Django automatically creates migrations for you. Just edit your model then run to generate the migrations

python manage.py makemigrations # or `make migrate` if using our built-in tools - creates and runs migrations in one
step

To actually run the migrations, do

python manage.py migrate # or `make migrate` if using our built-in tools.

How to rollback a migration?

If you are currently on migration 0011someothertext, you can go back to 0010different_text by just referencing the number with.

python manage.py migrate my_app 0010

If you want to get rid of the 0011 migration for good, delete it after running this command.

How to squash existing migrations and make maintenance simple

Given that I have an app named core and migrations numbered 0001 to 0044 that I'd like to squash together, I can run this:

python manage.py squashmigrations core 0001 0044

RunPython commands within the migrations will have issues, but in most cases you can delete these from the generated squashed migration.

(I ran into an edge case that required me to manually delete indexes from the generated migration and then create a follow-up migration adding indexes - hopefully this is a transient issue.)

Once you are happy and have tested thoroughly (e.g. setting up a dev db from scratch), you can delete the old migrations.

How to merge migrations when clashes occur due to parallel development

# assuming the venv is working
python manage.py makemigrations --merge <my_app>

How to mark a migration as already run even if it hasn't been run?

This is useful for when the generated migrations are broken and you get errors about tables or indexes already existing

python manage.py migrate --fake

How to add a db index on a single column?

started_at = models.DateTimeField(db_index=True)

How to add a default for a column?

country = models.CharField(
  default="de"
)

How to add an image or file field?

First create a top-level method (outside of any model) for giving you a file path - ideally one with some uniqueness checks built it to prevent clobbering when users have distinct files with the same name.

# models.py
def venue_directory_path(instance, filename):
    # File will be uploaded to MEDIA_ROOT/venue/<uuid>/<filename>
    return "venue/{0}/{1}".format(uuid.uuid1(), filename)

Now reference this in the upload_to property of the field

class BlogPost(models.model):
    photo_file = models.ImageField(
        _("photo"), null=True, blank=True, upload_to=venue_directory_path
    )

Danger: If you change this function name, the reference to it within existing migrations will break. If the previous migrations are deployed, squash the existing migrations and change there. Otherwise, change the references in all migrations to the new function name.

How to limit a column to a set of choices

Pass in an array of tuples (a tuple is (x, y)), where the x is what is saved in the db and y is what the user sees (e.g. in the admin area)

country = models.CharField(
  default="de"
  choices=[("de", "Germany")]
)

How to make a column unique?

name = models.CharField(
  unique=True,
)

How to enforce uniqueness across multiple columns

Add a constraint in Meta

class Product(models.Model):
    class Meta:
        constraints = [
            models.UniqueConstraint(fields=['name', 'country'], name='Unique city name per country')
        ]

How to set num digits and precision for decimal type?

latitude = models.DecimalField(
    max_digits=18, decimal_places=10
)

How to control how the model appears in the admin area or the console?

class User():
  # Just show the individual username
  def __str__(self):
    return self.username

How to add timestamps like created_at to every model

Use an abstract class and have other models inherit from it.

class TimeStampedModel(models.Model):
    created_at = models.DateTimeField(_("created_at"), auto_now_add=True)
    updated_at = models.DateTimeField(_("updated_at"), auto_now=True)

    class Meta:
        abstract = True


class Product(TimeStampedModel):
  pass

How to convert a legacy DB into models and migrations

Django lets you take an existing DB and convert it into a models.py file

python manage.py inspectdb

How to get around an issue when an automatically generated migration that converts a field between string and foreign key types fails?

Example error: "django.db.utils.DataError: invalid input syntax for type bigint"

Get Django to create two migration files such that the field is deleted and created afresh instead of altered (which does not work)

How to create a migration that updates existing records

First create an empty migration for the app you want to migrate. Our standard app is called core, so we would do this:

python manage.py makemigrations core --empty

Then within this migration create a two-arg function that gets passed to the RunPython operation

from django.db import migrations


def update_license_info(apps, schema_editor):
    Venue = apps.get_model("core", "Venue")
    Experience = apps.get_model("core", "Experience")
    for row in Venue.objects.all():
        row.license = "cc0"
        row.save()
    for row in Experience.objects.all():
        row.license = "cc0"
        row.save()


class Migration(migrations.Migration):

    dependencies = [
        ("core", "0036_auto_20210731_1737"),
    ]

    operations = [
        migrations.RunPython(update_license_info),
    ]

How to use DB views?

class TempUser(models.Model):
  first_name = models.CharField(max_length=100)

  class Meta:
    # Key part: set `managed` to False. We also need to specify the `db_table`
    managed = False
    db_table = "temp_user"
  def update(self, instance, validated_data):
      instance.__dict__.update(**validated_data)
      instance.save()

How to get bidirectional relationships?

Say you define Map#users and want to get user.maps. How do you do so, given the model looks like this?

class Map(db.Model):
    users = models.ManyToManyField(User)

Answer: Django automatically defines it as x_set - so here map_set

If you want to customize the name to be something shorter, e.g. maps (without the _set suffix), then use related_name

class Map(db.Model):
    users = models.ManyToManyField(User, related_name='maps')

If, OTOH, you want Django to ignore a related_name altogether, this is possible too. You might need this to avoid a clash when migrating

class Map(db.Model):
    # Simply set the `related_name` as `+`
    users = models.ManyToManyField(User, related_name='+)

How to reference a model that is not yet defined?

Wrap it in a string instead of leaving it bare

class BlogPost(BaseModel):
  publisher = models.ForeignKey(
      "User",
      verbose_name="publisher",

class User(BaseModel):
  pass

Note that this works in settings.py paths too

DEFAULT_AUTO_FIELD = django.db.models.AutoField # fails
DEFAULT_AUTO_FIELD = "django.db.models.AutoField" # works

How to do one to many relationships?

from django.db import models

class Blog(models.Model):
    pass

class Entry(models.Model):
    blog = models.ForeignKey(Blog, null=True)

How to do many to many relationships?

class Topping(models.Model):
    pass

class Pizza(models.Model):
    toppings = models.ManyToManyField(Topping)

How to do one-to-one relationships?

Use a OneToOneField instead of ForeignKey

from django.db import models

class Place(models.Model):
    name = models.CharField(max_length=50)
    address = models.CharField(max_length=80)

    def __str__(self):
        return "%s the place" % self.name

class Restaurant(models.Model):
    place = models.OneToOneField(
        Place,
        on_delete=models.CASCADE,
        primary_key=True,
    )

How to model self-referential relationships

Reference self as the argument in a ForeignKey field

boss = models.ForeignKey('self')

How to cascade on deletion?

from django.db import models

class Blog(models.Model):
    pass

class Entry(models.Model):
    blog = models.ForeignKey(Blog, on_delete=models.CASCADE)

How to simulate hasmanythrough (a convenience in other frameworks)

Example: A city has many venues, each of which has many experiences in turn. We want to elegantly capture that city has many experiences due to this relationship.

The trick is to call Model.objects afresh instead of defining new fields. Note also, the use of self in the argument to filter

class City:

    def experiences(self):
        return Experience.objects.filter(venue__city=self)

Validation

How to add validators

The easiest are built-in validators, which you pass as the validators array to a field definition

Example: validate value between 1 and 10

from django.core import validators

class Experience
  rating = models.IntegerField(
      null=False,
      default=5,
      validators=[validators.MinValueValidator(1),
                  validators.MaxValueValidator(10)])

Alternatively, you can write your own validations by making functions that raise ValidationError and then passing this in to the validators key of a field


import datetime

from django.core.exceptions import ValidationError

def validate_not_in_past(value):
    now = datetime.datetime.now()
    if value < now:
        raise ValidationError('Datetime must not be in the past')

class Experience
    ends_at = models.DateTimeField(validators=[validate_not_in_past],
                                   blank=True,
                                   null=True)

Some built in validators

FileExtensionValidator(["jpg", "png"])
URLValidator()
URLValidator(['https'])
EmailValidator()
RegexValidator(
    r'^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$',
    '{} is not a valid color, it must be either #123 or #123456'.format(color),
)
validate_ipv4_address

How to trigger validations

Validation does not happen with create or save methods - instead it only happens in form object or when you explicitly call instance.full_clean() .

How to limit values in DB to a predefined set of choices

# This choices list is a list of 2-part lists - LHS is what is in DB, RHS what is shown to user
CATEGORY_CHOICES = [[x, x] for x in ("nearby", "special_offer", "happening_soon")]

category = models.CharField(
    "category",
    blank=False,
    choices=CATEGORY_CHOICES,
    max_length=120,
)

Querying

How to see SQL generated by a queryset

Thanks to reference

Just call str() on the query property

>>> queryset = Event.objects.some_filter().query
>>> str(queryset)
SELECT "events_event"."id", "events_event"."epic_id",
    "events_event"."details", "events_event"."years_ago"
    FROM "events_event

How to find an item with its id equal to 1 or name = x

id = 1
User.objects.get(pk=id)
name = "x"
User.objects.get(name=name)

How to do where clauses

The format is filter, with argument field_name then __ then some operation, like startswith or gte

q1 = Entry.objects.filter(headline__startswith="What")
q2 = Entry.objects.filter(pub_date__gte=datetime.date.today())
q3 = Entry.objects.filter(name="abc")

How to do LIKE clause

Entry.objects.get(headline__contains='Lennon') # or icontains for case insensitive

Translate into

SELECT ... WHERE headline LIKE '%Lennon%';

How to work with ranges

Pass in a tuple to a field__range argument to filter

Person.objects.filter(age__range=(10, 20))

How to find rows that have duplicates in a certain column

The overall strategy is to first prepare a list of duplicated values for the field you want to de-dupe.

To do this, you annotate the data (add a field) with a computed field that counts (using Count) the number of entries with a given value and then filter such that the count must be over 1.

from django.db.models import Max, Sum

duplicates = User.objects.values(
  'first_name'
).annotate(name_count=Count('first_name')).filter(name_count__gt=1)

User.objects.filter(first_name__in=[item['first_name'] for item in duplicates])

How to search for nested key/value pairs in JSON columns?

Use __contains

Order.objects.filter(data__contains={"InvoiceNumber": {"FirstName": "Daan"}})

How to get distinct records

from django.db.models import Max, Sum
distinct = User.objects.values(
  'first_name'
).annotate(
  name_count=Count('first_name')
).filter(name_count=1)

How to get counts for each value of a column

(User.objects.values('is_active') .annotate(total=Count('id')))

This transforms into

SELECT
    is_active,
    COUNT(id) AS total
FROM
    auth_user
GROUP BY
    is_active

The order is important: failing to call values before annotate will not produce aggregate results.

To group by multiple fields, give multiple args to values

User.objects
.values('is_active', 'is_staff')
.annotate(total=Count('id')))

Moreover, you can even group by an expression

(User.objects
.values('date_joined__year')
.annotate(total=Count('id')))

which is

SELECT
    EXTRACT('year' FROM date_joined),
    COUNT(id) AS total
FROM
    auth_user
GROUP BY
    EXTRACT('year' FROM date_joined)

How to count both a column and its related column at once

Be sure to use distinct=True with Count

Given 2 experiences and one venue, this gives the wrong value

City.objects.annotate(vcount=Count("venues")).annotate(ecount=Count("venues__experiences")).values('vcount', 'ecount')
<CityQuerySet [{'vcount': 2, 'ecount': 2}]>

Fix by adding distinct=True

City.objects.annotate(vcount=Count("venues", distinct=True)).annotate(ecount=Count("venues__experiences", distinct=True)).values('vcount', 'ecount')
<CityQuerySet [{'vcount': 1, 'ecount': 2}]>

How to count distinct occurrences?

Use distinct=True as an arg to Count in annotate

(User.objects
.values('is_active')
.annotate(
    total=Count('id'),
    unique_names=Count('last_name', distinct=True),
))

How to query for records that have at least one associated record of type x

Just add select_related

# Performs an inner join
Student.objects.select_related('attendance')

How to avoid N+1 problems when accessing foreign-key/reverse related attributes?

By default Django does not grab these objects when you a query. Change that behavior with select_related('entity_name')

queryset = Person.objects.all().select_related('father')  # Foreign key object is included in query and cached
for person in queryset:
    person.father  # Hits the cache instead of the database

How to query for records with no associated record types

A venue can have many experiences

To get all venues with no experiences, do this:

Venue.objects.filter(experiences__isnull=True)

How to query get items where associated record matches a condition

Structure the filter in terms of some condition on another table. Here, we get author objects returned, who have associated books wtih bestseller set to True.

best_authors = Author.objects.filter(books__bestseller=True)

How to do aggregations like max/avg/sum etc

Call aggregate with the corresponding class

from django.db.models import Max, Sum
User.objects.all().aggregate(Min('id'))
User.objects.all().aggregate(Sum('id'))

How to filter for related records that match a complex condition

Annotate with a Case, When and wrap with Sum then filter on the result:

class ExperienceQuerySet(QuerySet):
  def in_future(self):
      return self.annotate(
          future_occurrences_count=Sum(
              Case(
                  When(occurrences__ends_at__date__gt=datetime.datetime.now(), then=1)
              ),
          )
      ).filter(future_occurrences_count__gt=0)

How to use db-specific arbitrary functions

from django.db.models import F, Func

# 1. Create a class that inherits from Func
class LevenshteinLikeZeus(Func):
# 2. Give it a function value
  function='levenshtein'

# 3. Add the results of calling the function to your data using `annotate`. To actually call
# the function on your data, pass the column to the newly created class. Lastly, you can
# filter on the results of the annotation.
Hero.objects.annotate(like_zeus=LevenshteinLikeZeus(F("name"))).filter(like_zeus__lt=2)

What does the Q thing do?

Think of it as a convenient help for combining queries together in complex ways.

# Here is a simple use of Q, where it just mimics the behavior of `filter`, even taking the same argument.

from django.db.models import Q
qs = User.objects.filter(Q(first_name__startswith='R'))

# The equivalent with `filter`
qs = User.objects.filter(first_name__startswith='R')

What is F()? Why use it? What are the gotchas?

An F() object represents either:

Why do we use F() objects? It makes it possible to refer to model field values and perform database operations using them without actually having to pull them out of the database into Python memory. Source

For example, with code like this, the operation happens entirely in the database. Python never knows about the value of 'stories_filed' below.

reporter = Reporters.objects.get(name='Tintin')
reporter.stories_filed = F('stories_filed') + 1
reporter.save()

If instead, we had report.stories_filed = reporter.stories_filed + 1 in multiple python threads, we risk race conditions. The F() object avoids this.

A big gotcha with F() objects is that they are applied on each save

reporter = Reporters.objects.get(name='Tintin')
reporter.stories_filed = F('stories_filed') + 1
reporter.save()

reporter.name = 'Tintin Jr.'
reporter.save()

# Here stories_filter will be incremented TWICE

To avoid this, reload the model object from db e.g. with refresh_from_db()

And as mentioned elsewhere, this can allow you to update many items at once, each with different values:

Reporter.objects.all().update(stories_filed=F('stories_filed') + 1)

How to execute raw SQL?

for p in Person.objects.raw('SELECT * FROM myapp_person'):
  print(p)

# Or parameterized
Person.objects.raw('SELECT * FROM myapp_person WHERE last_name = %s', [lname])

How to display the SQL generated by the ORM?

call str() on the .query method of your queryset

e.g.

str(City.objects.all().query)

How to run an OR query?

# option 1: use `|` between multiple `Q` values given as the single argument to `filter`
from django.db.models import Q
qs = User.objects.filter(Q(first_name__startswith='R')|Q(last_name__startswith='D'))

# option 2: use `|` between two Model.objects.filter() chanis
qs = User.objects.filter(first_name__startswith='R') | User.objects.filter(last_name__startswith='D')

How to run an AND query

# Option 1: Pass both conditions in as _subsequent_ arguments to `filter`
User.objects.filter(
  first_name__startswith='R',
  last_name__startswith='D'
)

# Option 2: Put `&` between two or more `Q` items passed as a single expression to `filter`

# See OR query above for analogue
from django.db.models import Q
queryset_3 = User.objects.filter(
 Q(first_name__startswith='R') &
 Q(last_name__startswith='D')
)

How to run a NOT query

# Option 1: Use `exclude` instead of `fitler`
User.objects.exclude(first_name__startswith='R')

# Option 2: Precede `Q` with `~`
User.objects.filter(~Q(id__lt=5))

How to combine AND, OR and NOT queries into one?

Use multiple Q arguments to filter for AND, preceding with ~ for negation, and using parentheses and | to OR between options

today_and_yesterday_greater_than_1000_without_BRK = (
    TickerPrice.objects.filter(
        models.Q(price__gt=1000),
        ~models.Q(ticker__symbol__startswith="BRK"),
        (models.Q(close_date=today) | models.Q(close_date=yesterday)),
    )
)

How to limit your results to only contain certain columns?

# Option 1: Use `values_list` and provide the list of columns
name_and_gender = Hero.objects.all().values_list(
 "name", "gender"
)

# A common usecase for this feature is `union`s
name_and_gender.union(
  Villain.objects.all().values_list(
  "name", "gender"
  )
)

# Option 2: Use `only`. Note how we add a filter earlier in the chain here.
Hero.objects.filter(
  country__startswith='R'
).only("name", "gender")

How to do aggregation in the database?

Use the aggregate function together with something like Max, then access the attribute name FIELDNAME__AGGREGATENAME

max_age = Person.objects.all().aggregate(Max('age'))['age__max']

How to limit results?

Use the python slicing functionality - e.g. this gets the first two elements:

Ticket.objects.all()[:2]

How to run a subquery?

A subquey is a query nested inside another query.

e.g. Get all UserParent objects that have a User (child) associated with them

from django.db.models import Subquery
# Pass in a `Subquery` object as part of a filter argument `y__in`
UserParent.objects.filter(user_id__in=Subquery(users.values('id')))

How to run a correlated subquery?

A correlated subquery is one a subquery uses values from the outer query.

The following comes from this source.

You have a posts table and posts have many columns. Imagine you wanted this query that returned post.id, and then newestcommenteremail for each post (i.e. two columns, with one column from posts, and one from a subquery run on all the comments of a particular post). The SQL would be:

SELECT "post"."id", (
    SELECT U0."email"
    FROM "comment" U0
    WHERE U0."post_id" = ("post"."id")
    ORDER BY U0."created_at" DESC LIMIT 1
) AS "newest_commenter_email" FROM "post"

The way to achieve with with Django is using OuterRef and SubQuery

from django.db.models import OuterRef, Subquery

newest_comment = Comment.objects.filter(post=OuterRef('pk')).order_by('-created_at')
Post.objects.annotate(newest_commenter_email=Subquery(newest_comment.values('email')[:1]))

How to dynamically compare two columns in a query

e.g. What if I want all records where the first and last name are equal?

from django.db.models import F
# The trick is to reference the column with a `F`
User.objects.filter(last_name=F("first_name"))

What if I want the first letter to be the same for two columns? Here I would use Substr in an earlier annotation round and the use F on one of these annotated fields.

User.objects.annotate(first=Substr("first_name", 1, 1), last=Substr("last_name", 1, 1)).filter(first=F("last"))

As another example, here is how to increase the age column in every row by 1:

Person.objects.update(age=F('age') + 1)

How to find records where a field could either be null or empty string?

See the section on running OR queries -- this is basically the same thing.

no_files_objects = MyModel.objects.filter(
  Q(file='')|Q(file=None)
)

How to make add some default filter to all queries

Define a Manager object in managers.py. It should have a method get_queryset that has a filter added on at the end. For example:

from django.db import models


class PubliclyViewableManager(models.Manager):
    def get_queryset(self):
        return (
            super()
            .get_queryset()
            .filter(deleted_at__isnull=True)
        )

Now in your model, assign this manager to the objects attribute to override it:

class BlogPost(BaseModel):
    objects = managers.PubliclyViewableManager()

    # OR: if you want to keep the original objects intact and apply a new filter to a new
    # property, do something like this:
    # publicly_viewable_objects = managers.PubliclyViewableManager()

How to add a bunch of database query methods that return sets of records?

Create a QuerySet and define the filter methods there

# query_sets.py
class SentPushNotificationsQuerySet(QuerySet):
    def nearby(self):
        return self.filter(notification_category="nearby")

    def special_offer(self):
        return self.filter(notification_category="special_offer")

    def happening_soon(self):
        return self.filter(notification_category="happening_soon")

    def today(self):
        return self.filter(created_at__date=datetime.datetime.now())

Then include this QuerySet in your model

class BlogPost(BaseModel)
    objects = query_sets.SentPushNotificationsQuerySet.as_manager()

Now call with BlogPost.objects.today() for example

How to deal with large queries without crashing your RAM

Use iterator()

# Save memory by not caching anything
for person in Person.objects.iterator():
    # Some logic

Ordering

How to order

# Call order_by('{column}'). Defaults to ascending order
User.objects.all().order_by('username')
# Add a minus in front of the column name for descending order
User.objects.all().order_by('-username')

How to make ordering case insensitive

By default Django will order by your database's default. To be more specific, use Lower or Upper to first convert a field into the same case (thereby making comparisons case insensitive)

from django.db.models.functions import Lower
User.objects.all().order_by(Lower('username'))

How to order by a field in an associated table

The key is to use the tablename__column format for the argument to order_by

Hero.objects.all().order_by(
  'category__name'
)

How to control behavior WRT nulls when ordering

Via arguments of nulls_last or nulls_first fed to asc or desc function on F

from django.db.models import F
MyModel.objects.all().order_by(F('price').desc(nulls_last=True))

How to find nth record in an ordering efficiently?

Use slices - it calls limit/offset behind the scenes

user = User.objects.order_by('-last_login')[1]

Updating Data

How to persist or remove many-to-many associations

Use {model.association}.add or {model.association}.remove

user.tweets.add(tweet)
user.tweets.remove(tweet)

How to update many rows in a way that dynamically depends on each row's value

Generally, you call objects.update to update many DB rows. The dynamic reference comes via F

from django.db.models import F
Person.objects.update(age=F('age')*1.5)

How to add a lifecycle method that updates upon save?

class Hero(models.Model):
  # If your fields depend on a model from a 3rd party app, which you do not control, use can use signals instead.
  # But otherwise, overriding `save` is probably more explicit.
  def save(self, *args, **kwargs):
    if not self.pk: # NB: This `if not self.pk`bit  is a way to check if something is created already or not.
      Category.objects.filter(pk=self.category_id).update(hero_count=F('hero_count')+1)
    super().save(*args, **kwargs)

How to check whether a column has changed on saving (so as to avoid needless expensive operations)

Example: we only want to call create_thumbnail when absolutely necessary.

The strategy is to get a reference to the original data via a query, then save the new data, then compare old to new, using that result to decide whether to call the expensive operation or not.

def save(self, *args, **kwargs):  # pylint: disable=signature-differs
    if self.pk is not None: # This means "if the record is being updated (vs. created from scratch)
        original_data = Experience.objects.get(pk=self.pk)
        super().save(*args, **kwargs)  # Call the "real" save() method.
        # Only create a thumbnail when the main image has changed.
        if original_data.photo_file != self.photo_file:
            self.create_thumbnail()
    else:
        super().save(*args, **kwargs)  # Call the "real" save() method.
        self.create_thumbnail()

Data cleaning

How to remove trailing and leading white space from all char and text fields?

Define a clean method (this is called automatically in forms)

class BaseModel(models.model):
    def clean(self):
        # Trim preceding and trailing white-spaces
        self.trim_preceding_and_trailing_white_spaces()

    def trim_preceding_and_trailing_white_spaces(self):
        for field in self._meta.fields:
            if isinstance(field, (models.CharField, models.TextField)):
                setattr(self, field.name, getattr(self, field.name).strip())

Now override save to call this automatically

 def save(self, **kwargs):
        self.clean()
        return super(BaseModel, self).save(**kwargs)

Error handling

How to react to unique constraint issues

from django.db import IntegrityError
except IntegrityError as e:
    if "UNIQUE constraint" in error.args[0]:
        # do something...

ORM Resources