This is part of the Semicolon&Sons Code Diary - consisting of lessons learned on the job. You're in the python category.
Last Updated: 2024-10-12
from django.contrib.auth.models import AbstractUser
class User(AbstractUser):
pass # remove this if you want to add custom code to extend the model
Django automatically creates migrations for you. Just edit your model then run to generate the migrations
python manage.py makemigrations # or `make migrate` if using our built-in tools - creates and runs migrations in one
step
To actually run the migrations, do
python manage.py migrate # or `make migrate` if using our built-in tools.
If you are currently on migration 0011someothertext, you can go back to 0010different_text by just referencing the number with.
python manage.py migrate my_app 0010
If you want to get rid of the 0011 migration for good, delete it after running this command.
Given that I have an app named core
and migrations numbered 0001 to 0044 that I'd like to squash
together, I can run this:
python manage.py squashmigrations core 0001 0044
RunPython
commands within the migrations will have issues, but in most cases you can delete
these from the generated squashed migration.
(I ran into an edge case that required me to manually delete indexes from the generated migration and then create a follow-up migration adding indexes - hopefully this is a transient issue.)
Once you are happy and have tested thoroughly (e.g. setting up a dev db from scratch), you can delete the old migrations.
# assuming the venv is working
python manage.py makemigrations --merge <my_app>
This is useful for when the generated migrations are broken and you get errors about tables or indexes already existing
python manage.py migrate --fake
started_at = models.DateTimeField(db_index=True)
country = models.CharField(
default="de"
)
First create a top-level method (outside of any model) for giving you a file path - ideally one with some uniqueness checks built it to prevent clobbering when users have distinct files with the same name.
# models.py
def venue_directory_path(instance, filename):
# File will be uploaded to MEDIA_ROOT/venue/<uuid>/<filename>
return "venue/{0}/{1}".format(uuid.uuid1(), filename)
Now reference this in the upload_to
property of the field
class BlogPost(models.model):
photo_file = models.ImageField(
_("photo"), null=True, blank=True, upload_to=venue_directory_path
)
Danger: If you change this function name, the reference to it within existing migrations will break. If the previous migrations are deployed, squash the existing migrations and change there. Otherwise, change the references in all migrations to the new function name.
Pass in an array of tuples (a tuple is (x, y)
), where the x
is what is saved in the db and y
is what the user sees (e.g. in the admin area)
country = models.CharField(
default="de"
choices=[("de", "Germany")]
)
name = models.CharField(
unique=True,
)
Add a constraint in Meta
class Product(models.Model):
class Meta:
constraints = [
models.UniqueConstraint(fields=['name', 'country'], name='Unique city name per country')
]
latitude = models.DecimalField(
max_digits=18, decimal_places=10
)
class User():
# Just show the individual username
def __str__(self):
return self.username
Use an abstract class and have other models inherit from it.
class TimeStampedModel(models.Model):
created_at = models.DateTimeField(_("created_at"), auto_now_add=True)
updated_at = models.DateTimeField(_("updated_at"), auto_now=True)
class Meta:
abstract = True
class Product(TimeStampedModel):
pass
Django lets you take an existing DB and convert it into a models.py
file
python manage.py inspectdb
Example error: "django.db.utils.DataError: invalid input syntax for type bigint"
Get Django to create two migration files such that the field is deleted and created afresh instead of altered (which does not work)
models.py
.First create an empty migration for the app you want to migrate. Our standard app is called core
, so we would do this:
python manage.py makemigrations core --empty
Then within this migration create a two-arg function that gets passed to the RunPython
operation
from django.db import migrations
def update_license_info(apps, schema_editor):
Venue = apps.get_model("core", "Venue")
Experience = apps.get_model("core", "Experience")
for row in Venue.objects.all():
row.license = "cc0"
row.save()
for row in Experience.objects.all():
row.license = "cc0"
row.save()
class Migration(migrations.Migration):
dependencies = [
("core", "0036_auto_20210731_1737"),
]
operations = [
migrations.RunPython(update_license_info),
]
class TempUser(models.Model):
first_name = models.CharField(max_length=100)
class Meta:
# Key part: set `managed` to False. We also need to specify the `db_table`
managed = False
db_table = "temp_user"
def update(self, instance, validated_data):
instance.__dict__.update(**validated_data)
instance.save()
Say you define Map#users
and want to get user.maps
. How do you do so, given the model looks like this?
class Map(db.Model):
users = models.ManyToManyField(User)
Answer: Django automatically defines it as x_set
- so here map_set
If you want to customize the name to be something shorter, e.g. maps
(without the _set
suffix), then use
related_name
class Map(db.Model):
users = models.ManyToManyField(User, related_name='maps')
If, OTOH, you want Django to ignore a related_name
altogether, this is possible too. You might need this to avoid a clash when migrating
class Map(db.Model):
# Simply set the `related_name` as `+`
users = models.ManyToManyField(User, related_name='+)
Wrap it in a string instead of leaving it bare
class BlogPost(BaseModel):
publisher = models.ForeignKey(
"User",
verbose_name="publisher",
class User(BaseModel):
pass
Note that this works in settings.py paths too
DEFAULT_AUTO_FIELD = django.db.models.AutoField # fails
DEFAULT_AUTO_FIELD = "django.db.models.AutoField" # works
from django.db import models
class Blog(models.Model):
pass
class Entry(models.Model):
blog = models.ForeignKey(Blog, null=True)
class Topping(models.Model):
pass
class Pizza(models.Model):
toppings = models.ManyToManyField(Topping)
Use a OneToOneField
instead of ForeignKey
from django.db import models
class Place(models.Model):
name = models.CharField(max_length=50)
address = models.CharField(max_length=80)
def __str__(self):
return "%s the place" % self.name
class Restaurant(models.Model):
place = models.OneToOneField(
Place,
on_delete=models.CASCADE,
primary_key=True,
)
Reference self
as the argument in a ForeignKey
field
boss = models.ForeignKey('self')
from django.db import models
class Blog(models.Model):
pass
class Entry(models.Model):
blog = models.ForeignKey(Blog, on_delete=models.CASCADE)
Example: A city
has many venues
, each of which has many experiences
in turn. We
want to elegantly capture that city
has many experiences
due to this relationship.
The trick is to call Model.objects
afresh instead of defining new fields. Note also, the use of self
in the argument
to filter
class City:
def experiences(self):
return Experience.objects.filter(venue__city=self)
The easiest are built-in validators, which you pass as the validators
array to a field definition
Example: validate value between 1 and 10
from django.core import validators
class Experience
rating = models.IntegerField(
null=False,
default=5,
validators=[validators.MinValueValidator(1),
validators.MaxValueValidator(10)])
Alternatively, you can write your own validations by making functions that raise ValidationError
and
then passing this in to the validators
key of a field
import datetime
from django.core.exceptions import ValidationError
def validate_not_in_past(value):
now = datetime.datetime.now()
if value < now:
raise ValidationError('Datetime must not be in the past')
class Experience
ends_at = models.DateTimeField(validators=[validate_not_in_past],
blank=True,
null=True)
Some built in validators
FileExtensionValidator(["jpg", "png"])
URLValidator()
URLValidator(['https'])
EmailValidator()
RegexValidator(
r'^#([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$',
'{} is not a valid color, it must be either #123 or #123456'.format(color),
)
validate_ipv4_address
Validation does not happen with create
or save
methods - instead it only happens in form object or when you
explicitly call instance.full_clean()
.
# This choices list is a list of 2-part lists - LHS is what is in DB, RHS what is shown to user
CATEGORY_CHOICES = [[x, x] for x in ("nearby", "special_offer", "happening_soon")]
category = models.CharField(
"category",
blank=False,
choices=CATEGORY_CHOICES,
max_length=120,
)
Thanks to reference
Just call str()
on the query
property
>>> queryset = Event.objects.some_filter().query
>>> str(queryset)
SELECT "events_event"."id", "events_event"."epic_id",
"events_event"."details", "events_event"."years_ago"
FROM "events_event
x
id = 1
User.objects.get(pk=id)
name = "x"
User.objects.get(name=name)
The format is filter
, with argument field_name
then __
then some operation, like startswith
or gte
q1 = Entry.objects.filter(headline__startswith="What")
q2 = Entry.objects.filter(pub_date__gte=datetime.date.today())
q3 = Entry.objects.filter(name="abc")
Entry.objects.get(headline__contains='Lennon') # or icontains for case insensitive
Translate into
SELECT ... WHERE headline LIKE '%Lennon%';
Pass in a tuple to a field__range
argument to filter
Person.objects.filter(age__range=(10, 20))
The overall strategy is to first prepare a list of duplicated values for the field you want to de-dupe.
To do this, you annotate
the data (add a field) with a computed field that counts (using Count
) the number of entries with a given value and then filter such that the count must be over 1.
from django.db.models import Max, Sum
duplicates = User.objects.values(
'first_name'
).annotate(name_count=Count('first_name')).filter(name_count__gt=1)
User.objects.filter(first_name__in=[item['first_name'] for item in duplicates])
Use __contains
Order.objects.filter(data__contains={"InvoiceNumber": {"FirstName": "Daan"}})
from django.db.models import Max, Sum
distinct = User.objects.values(
'first_name'
).annotate(
name_count=Count('first_name')
).filter(name_count=1)
(User.objects.values('is_active') .annotate(total=Count('id')))
This transforms into
SELECT
is_active,
COUNT(id) AS total
FROM
auth_user
GROUP BY
is_active
The order is important: failing to call values before annotate will not produce aggregate results.
To group by multiple fields, give multiple args to values
User.objects
.values('is_active', 'is_staff')
.annotate(total=Count('id')))
Moreover, you can even group by an expression
(User.objects
.values('date_joined__year')
.annotate(total=Count('id')))
which is
SELECT
EXTRACT('year' FROM date_joined),
COUNT(id) AS total
FROM
auth_user
GROUP BY
EXTRACT('year' FROM date_joined)
Be sure to use distinct=True
with Count
Given 2 experiences and one venue, this gives the wrong value
City.objects.annotate(vcount=Count("venues")).annotate(ecount=Count("venues__experiences")).values('vcount', 'ecount')
<CityQuerySet [{'vcount': 2, 'ecount': 2}]>
Fix by adding distinct=True
City.objects.annotate(vcount=Count("venues", distinct=True)).annotate(ecount=Count("venues__experiences", distinct=True)).values('vcount', 'ecount')
<CityQuerySet [{'vcount': 1, 'ecount': 2}]>
Use distinct=True
as an arg to Count
in annotate
(User.objects
.values('is_active')
.annotate(
total=Count('id'),
unique_names=Count('last_name', distinct=True),
))
Just add select_related
# Performs an inner join
Student.objects.select_related('attendance')
By default Django does not grab these objects when you a query. Change that behavior with
select_related('entity_name')
queryset = Person.objects.all().select_related('father') # Foreign key object is included in query and cached
for person in queryset:
person.father # Hits the cache instead of the database
A venue can have many experiences
To get all venues with no experiences, do this:
Venue.objects.filter(experiences__isnull=True)
Structure the filter
in terms of some condition on another table. Here, we get author objects returned,
who have associated books wtih bestseller
set to True
.
best_authors = Author.objects.filter(books__bestseller=True)
Call aggregate with the corresponding class
from django.db.models import Max, Sum
User.objects.all().aggregate(Min('id'))
User.objects.all().aggregate(Sum('id'))
Annotate with a Case, When and wrap with Sum then filter on the result:
class ExperienceQuerySet(QuerySet):
def in_future(self):
return self.annotate(
future_occurrences_count=Sum(
Case(
When(occurrences__ends_at__date__gt=datetime.datetime.now(), then=1)
),
)
).filter(future_occurrences_count__gt=0)
from django.db.models import F, Func
# 1. Create a class that inherits from Func
class LevenshteinLikeZeus(Func):
# 2. Give it a function value
function='levenshtein'
# 3. Add the results of calling the function to your data using `annotate`. To actually call
# the function on your data, pass the column to the newly created class. Lastly, you can
# filter on the results of the annotation.
Hero.objects.annotate(like_zeus=LevenshteinLikeZeus(F("name"))).filter(like_zeus__lt=2)
Q
thing do?Think of it as a convenient help for combining queries together in complex ways.
# Here is a simple use of Q, where it just mimics the behavior of `filter`, even taking the same argument.
from django.db.models import Q
qs = User.objects.filter(Q(first_name__startswith='R'))
# The equivalent with `filter`
qs = User.objects.filter(first_name__startswith='R')
F()
? Why use it? What are the gotchas?An F() object represents either:
chairs_needed=F('num_employees') - F('num_chairs'))
Why do we use F() objects? It makes it possible to refer to model field values and perform database operations using them without actually having to pull them out of the database into Python memory. Source
For example, with code like this, the operation happens entirely in the database. Python never knows about the value of 'stories_filed' below.
reporter = Reporters.objects.get(name='Tintin')
reporter.stories_filed = F('stories_filed') + 1
reporter.save()
If instead, we had report.stories_filed = reporter.stories_filed + 1
in multiple python threads, we risk race
conditions. The F()
object avoids this.
A big gotcha with F() objects is that they are applied on each save
reporter = Reporters.objects.get(name='Tintin')
reporter.stories_filed = F('stories_filed') + 1
reporter.save()
reporter.name = 'Tintin Jr.'
reporter.save()
# Here stories_filter will be incremented TWICE
To avoid this, reload the model object from db e.g. with refresh_from_db()
And as mentioned elsewhere, this can allow you to update many items at once, each with different values:
Reporter.objects.all().update(stories_filed=F('stories_filed') + 1)
for p in Person.objects.raw('SELECT * FROM myapp_person'):
print(p)
# Or parameterized
Person.objects.raw('SELECT * FROM myapp_person WHERE last_name = %s', [lname])
call str()
on the .query
method of your queryset
e.g.
str(City.objects.all().query)
# option 1: use `|` between multiple `Q` values given as the single argument to `filter`
from django.db.models import Q
qs = User.objects.filter(Q(first_name__startswith='R')|Q(last_name__startswith='D'))
# option 2: use `|` between two Model.objects.filter() chanis
qs = User.objects.filter(first_name__startswith='R') | User.objects.filter(last_name__startswith='D')
# Option 1: Pass both conditions in as _subsequent_ arguments to `filter`
User.objects.filter(
first_name__startswith='R',
last_name__startswith='D'
)
# Option 2: Put `&` between two or more `Q` items passed as a single expression to `filter`
# See OR query above for analogue
from django.db.models import Q
queryset_3 = User.objects.filter(
Q(first_name__startswith='R') &
Q(last_name__startswith='D')
)
# Option 1: Use `exclude` instead of `fitler`
User.objects.exclude(first_name__startswith='R')
# Option 2: Precede `Q` with `~`
User.objects.filter(~Q(id__lt=5))
Use multiple Q
arguments to filter
for AND, preceding with ~
for negation, and using parentheses and |
to OR
between options
today_and_yesterday_greater_than_1000_without_BRK = (
TickerPrice.objects.filter(
models.Q(price__gt=1000),
~models.Q(ticker__symbol__startswith="BRK"),
(models.Q(close_date=today) | models.Q(close_date=yesterday)),
)
)
# Option 1: Use `values_list` and provide the list of columns
name_and_gender = Hero.objects.all().values_list(
"name", "gender"
)
# A common usecase for this feature is `union`s
name_and_gender.union(
Villain.objects.all().values_list(
"name", "gender"
)
)
# Option 2: Use `only`. Note how we add a filter earlier in the chain here.
Hero.objects.filter(
country__startswith='R'
).only("name", "gender")
Use the aggregate
function together with something like Max
, then access the attribute name
FIELDNAME__AGGREGATENAME
max_age = Person.objects.all().aggregate(Max('age'))['age__max']
Use the python slicing functionality - e.g. this gets the first two elements:
Ticket.objects.all()[:2]
A subquey is a query nested inside another query.
e.g. Get all UserParent objects that have a User (child) associated with them
from django.db.models import Subquery
# Pass in a `Subquery` object as part of a filter argument `y__in`
UserParent.objects.filter(user_id__in=Subquery(users.values('id')))
A correlated subquery is one a subquery uses values from the outer query.
The following comes from this source.
You have a posts table and posts have many columns. Imagine you wanted this query that returned post.id, and then newestcommenteremail for each post (i.e. two columns, with one column from posts, and one from a subquery run on all the comments of a particular post). The SQL would be:
SELECT "post"."id", (
SELECT U0."email"
FROM "comment" U0
WHERE U0."post_id" = ("post"."id")
ORDER BY U0."created_at" DESC LIMIT 1
) AS "newest_commenter_email" FROM "post"
The way to achieve with with Django is using OuterRef
and SubQuery
from django.db.models import OuterRef, Subquery
newest_comment = Comment.objects.filter(post=OuterRef('pk')).order_by('-created_at')
Post.objects.annotate(newest_commenter_email=Subquery(newest_comment.values('email')[:1]))
e.g. What if I want all records where the first and last name are equal?
from django.db.models import F
# The trick is to reference the column with a `F`
User.objects.filter(last_name=F("first_name"))
What if I want the first letter to be the same for two columns? Here I would use Substr
in an earlier annotation round
and the use F
on one of these annotated fields.
User.objects.annotate(first=Substr("first_name", 1, 1), last=Substr("last_name", 1, 1)).filter(first=F("last"))
As another example, here is how to increase the age column in every row by 1:
Person.objects.update(age=F('age') + 1)
See the section on running OR queries -- this is basically the same thing.
no_files_objects = MyModel.objects.filter(
Q(file='')|Q(file=None)
)
Define a Manager
object in managers.py
. It should have a method get_queryset
that has a filter added on at the
end. For example:
from django.db import models
class PubliclyViewableManager(models.Manager):
def get_queryset(self):
return (
super()
.get_queryset()
.filter(deleted_at__isnull=True)
)
Now in your model, assign this manager to the objects
attribute to override it:
class BlogPost(BaseModel):
objects = managers.PubliclyViewableManager()
# OR: if you want to keep the original objects intact and apply a new filter to a new
# property, do something like this:
# publicly_viewable_objects = managers.PubliclyViewableManager()
Create a QuerySet
and define the filter methods there
# query_sets.py
class SentPushNotificationsQuerySet(QuerySet):
def nearby(self):
return self.filter(notification_category="nearby")
def special_offer(self):
return self.filter(notification_category="special_offer")
def happening_soon(self):
return self.filter(notification_category="happening_soon")
def today(self):
return self.filter(created_at__date=datetime.datetime.now())
Then include this QuerySet
in your model
class BlogPost(BaseModel)
objects = query_sets.SentPushNotificationsQuerySet.as_manager()
Now call with BlogPost.objects.today()
for example
Use iterator()
# Save memory by not caching anything
for person in Person.objects.iterator():
# Some logic
# Call order_by('{column}'). Defaults to ascending order
User.objects.all().order_by('username')
# Add a minus in front of the column name for descending order
User.objects.all().order_by('-username')
By default Django will order by your database's default. To be more specific, use Lower
or Upper
to first convert a
field into the same case (thereby making comparisons case insensitive)
from django.db.models.functions import Lower
User.objects.all().order_by(Lower('username'))
The key is to use the tablename__column
format for the argument to order_by
Hero.objects.all().order_by(
'category__name'
)
Via arguments of nulls_last
or nulls_first
fed to asc
or desc
function on F
from django.db.models import F
MyModel.objects.all().order_by(F('price').desc(nulls_last=True))
Use slices - it calls limit/offset behind the scenes
user = User.objects.order_by('-last_login')[1]
Use {model.association}.add
or {model.association}.remove
user.tweets.add(tweet)
user.tweets.remove(tweet)
Generally, you call objects.update
to update many DB rows. The dynamic reference comes via F
from django.db.models import F
Person.objects.update(age=F('age')*1.5)
class Hero(models.Model):
# If your fields depend on a model from a 3rd party app, which you do not control, use can use signals instead.
# But otherwise, overriding `save` is probably more explicit.
def save(self, *args, **kwargs):
if not self.pk: # NB: This `if not self.pk`bit is a way to check if something is created already or not.
Category.objects.filter(pk=self.category_id).update(hero_count=F('hero_count')+1)
super().save(*args, **kwargs)
Example: we only want to call create_thumbnail
when absolutely necessary.
The strategy is to get a reference to the original data via a query, then save the new data, then compare old to new, using that result to decide whether to call the expensive operation or not.
def save(self, *args, **kwargs): # pylint: disable=signature-differs
if self.pk is not None: # This means "if the record is being updated (vs. created from scratch)
original_data = Experience.objects.get(pk=self.pk)
super().save(*args, **kwargs) # Call the "real" save() method.
# Only create a thumbnail when the main image has changed.
if original_data.photo_file != self.photo_file:
self.create_thumbnail()
else:
super().save(*args, **kwargs) # Call the "real" save() method.
self.create_thumbnail()
Define a clean
method (this is called automatically in forms)
class BaseModel(models.model):
def clean(self):
# Trim preceding and trailing white-spaces
self.trim_preceding_and_trailing_white_spaces()
def trim_preceding_and_trailing_white_spaces(self):
for field in self._meta.fields:
if isinstance(field, (models.CharField, models.TextField)):
setattr(self, field.name, getattr(self, field.name).strip())
Now override save
to call this automatically
def save(self, **kwargs):
self.clean()
return super(BaseModel, self).save(**kwargs)
from django.db import IntegrityError
except IntegrityError as e:
if "UNIQUE constraint" in error.args[0]:
# do something...