feat: initial import from blackmesa project

This commit is contained in:
Gabor Körber 2025-01-20 21:31:51 +01:00
parent 4e46bb90b0
commit e539aaf158
10 changed files with 588 additions and 3 deletions

14
.gitignore vendored Normal file
View File

@ -0,0 +1,14 @@
# Python-generated files
__pycache__/
*.py[oc]
build/
dist/
wheels/
*.egg-info
# Virtual environments
.venv
# Specific to this repo
uv.lock
db.sqlite3

View File

@ -1,3 +1,79 @@
# django_records # Django Records
Django Records Django Records aims to provide a queryset extension facility, that allows to directly create structured data from querysets, other than Model Instances.
An example is to create dataclass instances as entities for a clean architecture setup, where model instances should not persist in the business layer and are just readonly representations of the fetched data, especially if you plan e.g. to use django-agnostic repository facades.
Main target is support of dataclass and pydantic, and allow building readonly representations of the data.
## The .records() queryset function
> Changed since 0.3: records is not supposed take an optional first parameter as record target, instead use record_into(), however support for it is kept.
`records()` fetches data from the database with a values() call, and uses that to directly create a dataclass, or any other such structure ("a record").
It completely skips instantiation of a django model instance, and comes with tools to make it easy to handle initialization data, so that you can automate even production of immutable or nested data structures (called `Adjunct`), that work similar to tools like `Q()` or `F()`.
Out of the box, records assumes, you want to use `dataclasses.dataclass` or a `pydantic.BaseModel`.
Usage:
```python
SomeModel.objects.filter(...).records('field1', 'field2', annotation=F(...), adjunct=Adjunct(...))
```
## Defining target default structure or a custom one with .record_into()
> Unstable: record_into() might be renamed.
- You can define the target class you want to create directly on the Queryset Manager (or Queryset)
```python
@dataclass
class Entity:
id: int
MyManager = BaseManager.from_queryset(RecordQuerySet)
MyManager._default_record = Entity
...
```
- You can add it to your model (I have not found a standard way to expand Meta yet)
```python
class MyModel(models.Model):
_default_record = Entity
```
- You could also add the `Handler` directly as `_default_record`, otherwise if you want to use another Handler than the default dataclasses handler, you can define `_record_handler` on the Queryset
- Finally you can override this behaviour by explicitly chaining the target class into the queryset with `record_into()`, which takes either a `RecordHandler` object, or any type of class that can be wrapped by one.
```python
SomeModel.objects.filter(...).record_into(TargetDataClass).records('field1', 'field2')
```
## Adjuncts
Just like Django Expressions can be used to annotate keys in the model that are retrieved, so can Adjuncts be used to circumvent this mechanic, and insert local data into your target class. You might want to use this, if e.g. the dataclass you create is _immutable_, or the dataclass you use has _required fields_, that need data when you create the class, but the data is not part of your database query.
> Adjuncts do not influence the underlying SQL, except being able to add keys to the values() call.
> The keys used in the kwargs to records() primarily represent the keys passed to the dataclass.
- `FixedValue` simply carries some data and inserts it into every model. e.g. `.records(data=FixedValue(1))` will set the field `data` always to 1.
- `MutValue` allows to use a callable as argument, which gets called when setting the field on the model. e.g. `.records(data=MutValue(lambda entry: 'x' in entry))`
- `MutValueNotNone` same as `MutValue` but only applies the callable if the database value is not None (shortcut).
- `Ref` uses a different key to retrieve the data from values, and may apply an Adjunct to it. This probably is the most used Adjunct in real life examples.
- `Skip` allows you to skip a field. This is needed, as records() would include all fields on a dataclass, without knowing if it is optional, and helpful if you rewrite the fields with a PostProcess.
- `PostProcess` allows you to call a function as a callback at creation - if the callback returns anything else than None, it is used as initializer for the production of the object.
## Testing
### Built-In Tests
...
### Integration Test: Examples Project
The [celestial project](examples/celestials/README.md) in examples serves to demonstrate basic usage of records, as well as providing integration testing.

19
pyproject.toml Normal file
View File

@ -0,0 +1,19 @@
[project]
name = "django-records"
version = "0.4.0"
description = "Django structured querysets"
readme = "README.md"
authors = [{ name = "Gabor Körber", email = "gab@g4b.org" }]
requires-python = ">=3.12"
dependencies = ["django>=4"]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[project.optional-dependencies]
pydantic = ["pydantic>=2"]
[tool.uv.workspace]
members = []
exclude = ["examples"]

View File

@ -0,0 +1,2 @@
def hello() -> str:
return "Hello from django-records!"

View File

@ -0,0 +1,122 @@
from abc import ABC
from typing import Any, Callable
class Adjunct(ABC):
"""
Baseclass that defines the Adjunct interface.
Any Adjunct data which does not translate into SQL, but rather adds data programmatically.
Adjuncts are filtered by records() and act as mutators of the retrieved database value
before creating the target record.
They can also provide a key to be included into the .values() call.
"""
__slots__ = ['value']
skip = False # if skip is true, this adjunct will not be actually processed.
resolves_field = True # if resolves_field is true, this adjunct will be called for a single field with resolve()
post_processing = False # if post_processing is true, this adjunct will in the end be called with dbdata, and be able to manipulate the whole dictionary.
def resolve(self, model, dbdata) -> Any | None:
"""
resolve returns the field value for one entry.
"""
raise NotImplementedError
def post_process(self, model, dbdata) -> dict | None:
"""if you have post_processing on True, this needs to be implemented.
has to return either a new dictionary to use in initialization of an object, or None.
"""
raise NotImplementedError
def values_field(self) -> str | tuple[str, str] | None:
"""
return a field for the values operator.
if you return None, it will not be added. (default)
if you return a string, it will be added to values as args.
if you return a tuple, it will be added to values as kwargs (key, value).
"""
return
class FixedValue(Adjunct):
"""always resolves to a fixed value."""
def __init__(self, value=None):
self.value = value
def resolve(self, model, dbdata):
return self.value
class MutValue(Adjunct):
"""adjunct value that returns a field value with a callback.
currently supports only 1 parameter (dbdata).
"""
def __init__(self, callback):
self.callback = callback if callable(callback) else None
def resolve(self, model, dbdata):
# at this point i could check if callback needs 0-2 arguments and decide the call.
if self.callback:
return self.callback(dbdata)
class MutValueNotNone(MutValue):
"""MutValue that only calls the callback if the dbdata is not None
(convenience function)
"""
def resolve(self, model, dbdata):
if dbdata is not None:
return super().resolve(model, dbdata)
class Ref(Adjunct):
"""
Adds this key to the .values() call, and processes it with an adjunct or callback.
If no adjunct is defined, it simply redirects the value as-is.
-> key can be a tuple of (keyname, expression) for annotations. # TODO: test this bold claim.
"""
__slots__ = ['adjunct', 'key']
def __init__(self, key, adjunct: Adjunct | Callable | None = None):
match adjunct:
case Adjunct(adj): self.adjunct = adjunct
case callback if callable(callback): self.adjunct = MutValueNotNone(callback)
case _: self.adjunct = None
self.key = key
def resolve(self, model, dbdata):
value = dbdata.get(self.key)
if self.adjunct:
return self.adjunct.resolve(model, value)
else:
return value
def values_field(self):
return self.key
class Skip(Adjunct):
"""Skips this key from being retrieved from the database or used in the dataclass instantiation."""
skip = True
resolves_field = False
class PostProcess(Adjunct):
"""calls a callback which can modify the whole initialization dictionary."""
__slots__ = ['callback']
resolves_field = False
post_processing = True
def __init__(self, callback):
self.callback = callback
def post_process(self, model, dbdata):
if self.callback:
return self.callback(dbdata)

View File

@ -0,0 +1,8 @@
class DjangoRecordsException(Exception):
...
class RecordClassDefinitionError(DjangoRecordsException):
...
class RecordInstanceError(DjangoRecordsException):
...

View File

@ -0,0 +1,78 @@
"""
RecordHandlers describe how a Record class should be built.
Each Record Type should define one.
Usually you just want to use one of the pre-built Handlers for dataclass, dict, or pydantic.
"""
from .errors import RecordClassDefinitionError
class RecordHandler:
"""
Handler for a record type
defines how a record can be created, and how to retrieve all field names, and the required ones.
"""
__slots__ = ['klass']
@classmethod
def wrap(cls, klass):
"""internal factory function used to wrap a non-record handler into a record handler"""
return cls(klass)
def __init__(self, klass):
self.klass = klass
def create(self, **kwargs):
"""the actual creation of the underlying record type instance"""
return self.klass(**kwargs)
def get_field_names(self):
"""should return all field names of the wrapped record type."""
return self.klass.__dict__.keys()
@property
def record(self):
"""property used to retrieve the wrapped class."""
return self.klass
@property
def required_arguments(self):
"""property used that can filter for required field names"""
return self.get_field_names()
class RecordDict(RecordHandler):
"""RecordHandler that outputs a dictionary"""
def __init__(self, klass=None):
# it is not required to define dict, but you could do OrderedDict e.g.
self.klass = klass or dict
def get_field_names(self) -> list[str]:
# dictionary has no required fields. any field is possible.
return []
class RecordDataclass(RecordHandler):
"""handles dataclasses.dataclass derivatives"""
def create(self, **kwargs):
# clean field names to be only valid if they are on the dataclass.
record_fields = self.get_field_names()
kwargs = {k: v for k, v in kwargs.items() if k in record_fields}
return self.klass(**kwargs)
def get_field_names(self) -> list[str]:
# returns all field names, even those which are not required.
# for dataclasses:
if hasattr(self.klass, '__dataclass_fields__'):
return list(self.klass.__dataclass_fields__.keys())
# for pydantic BaseModel:
if hasattr(self.klass, '__fields__'):
return list(self.klass.__fields__.keys())
raise RecordClassDefinitionError("Field Names not found.")

View File

View File

@ -0,0 +1,151 @@
import logging
from django.db.models import QuerySet, Model
from django.db.models.expressions import BaseExpression, Combinable
from django.db.models.manager import Manager
from django.db.models.query import ValuesIterable
from .adjuncts import Adjunct
from .handlers import RecordDataclass, RecordHandler
from .errors import RecordClassDefinitionError, RecordInstanceError
logger = logging.getLogger(f"django_records.{__name__}")
class RecordIterable(ValuesIterable):
"""
Iterable returned by records() that yields a record class for each row.
Replaces the standard iterable of the queryset.
"""
def __iter__(self):
queryset: QuerySet = self.queryset
model: Model = self.queryset.model
query = queryset.query
compiler = query.get_compiler(queryset.db)
record_data = getattr(queryset, '_record_kwargs', {})
record_handler = queryset._record
# extra(select=...) cols are always at the start of the row.
names = [
*query.extra_select,
*query.values_select,
*query.annotation_select,
]
indexes = range(len(names))
for row in compiler.results_iter(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size):
dbdata = {names[i]: row[i] for i in indexes}
# post-processors will be able to rewrite the whole dictionary.
post_processors = []
# we overwrite db data bluntly for now. actually we would provide callbacks the current dict.
for k, v in record_data.items():
if v.resolves_field:
dbdata[k] = v.resolve(model, dbdata)
if v.post_processing:
post_processors.append(v)
if post_processors:
for processor in post_processors:
processed = processor.post_process(model, dbdata)
if processed is not None:
dbdata = processed
try:
record = record_handler.create(**dbdata)
except Exception as e:
raise RecordInstanceError("Error creating Record instance") from e
yield record
class RecordQuerySetMixin:
_record_handler = RecordDataclass
def record_into(self, handler):
self._record = handler
return self
def records(self, *args, **kwargs):
"""
generates record objects
Acts like values(), however:
- if record type is not defined with record_into(), you have to define it on the queryset, or the model, with _default_record,
otherwise it will raise a RuntimeError.
- keyword arguments of type "Adjunct" are used as deferred values, and resolved independently.
- values() is called with every required_argument on the dataclass not handled by an Adjunct
"""
if len(args) and not isinstance(args[0], str):
# we assume this is our dataclass
handler = args[0]
args = args[1:]
# @deprecate: we might remove this
logger.warning("Defining the target class in args might be soon deprecated: %s", handler)
else:
handler = getattr(self, '_record', getattr(self, '_default_record', getattr(self.model, '_default_record', None)))
if not handler:
raise RecordClassDefinitionError("Trying records() on a Queryset without destination class.")
if not isinstance(handler, RecordHandler):
handler = self._record_handler.wrap(handler)
all_keys = [*args, *kwargs.keys()]
unhandled_keys = list(set(handler.required_arguments) - set(all_keys))
args = [*args, *unhandled_keys]
# rebuild keyword arguments for values, by filtering out our adjuncts
new_kw = {}
adjuncts = {}
for k, v in kwargs.items():
if isinstance(v, Adjunct):
# skip allows an adjunct to completely ignore a key.
if not v.skip:
adjuncts[k] = v
# check if we have to add to values. adjuncts can define a field to add here.
add_to_values = v.values_field()
if isinstance(add_to_values, str) and add_to_values not in args:
args.append(add_to_values)
elif isinstance(add_to_values, tuple):
new_kw[add_to_values[0]] = add_to_values[1]
elif isinstance(v, BaseExpression) or isinstance(v, Combinable) or hasattr(v, 'resolve_expression'):
new_kw[k] = v
elif v is None:
# ignore None
pass
else:
# this will fail in values() for now, but i do not want to hijack future django functionality here.
# however it would be just funky if we actually replace this with new_kw[k] = Val(v).
new_kw[k] = v
# copy ourself with values() and save the results on the cloned queryset values produces.
try:
values = self.values(*args, **new_kw)
except Exception as e:
raise RecordInstanceError("Error with calculated values") from e
values._iterable_class = RecordIterable
values._record_kwargs = adjuncts
values._record = handler
return values
class RecordQuerySet(RecordQuerySetMixin, QuerySet):
# overwrite cloning. I would love to have a way to inject this into django directly (or use model.Meta)
def _clone(self):
c = super()._clone()
for key in ['_record', # saves the actual final handler until the iterator is consumed
'_record_kwargs', # saves the actual kwargs to records until the iterator is consumed
'_record_handler', # if the default handler to transform target classes, by default dataclasses
'_default_record', # the default target class for this particular model
]:
if hasattr(self, key):
setattr(c, key, getattr(self, key))
return c
# Alternative:
# class RecordManager(BaseManager.from_queryset(RecordQuerySet)):
# pass
class RecordManager(RecordQuerySetMixin, Manager):
def get_queryset(self):
return RecordQuerySet(self.model, using=self._db)

115
src/django_records/tests.py Normal file
View File

@ -0,0 +1,115 @@
from dataclasses import dataclass
from unittest import mock, TestCase
from django.db.models import F
from . import handlers
from .adjuncts import MutValue as Mut, FixedValue as Val, Skip, PostProcess, Ref
from .records import RecordIterable, RecordQuerySetMixin
@dataclass
class TestDataClass:
id: int
name: str
age: int
street: str
parent: 'TestDataClass' = None
class TestRecords(TestCase):
def test_records_basic(self):
lam = lambda entry: entry.get('name')
ref = lambda pk: f'referenced: {pk}'
cb = lambda entry: {**entry, **{'new': 'field'}}
MockedValues = mock.MagicMock()
values_return = mock.MagicMock(return_value=[{'id': 1, 'name': 'Name', 'age': 18, 'street_id': 2, 'two': 'Two', 'one': 'One'}])
MockedValues.return_value = values_return
qs = RecordQuerySetMixin()
qs.values = MockedValues
result = qs.records(
TestDataClass,
'one',
two=F('field'),
full_name=Mut(lam),
street=Ref('street_id', ref),
ignored=None,
fixed=Val(1),
parent=Skip(),
post_process=PostProcess(cb),
)
# what we expect in the values call is:
expected_in_values = [
'one',
'two',
'id',
'name',
'age',
'street_id',
]
not_expected_in_values = ['full_name', 'street', 'ignored', 'fixed', 'parent', 'post_process']
args_list = list(MockedValues.call_args[0]) + list(MockedValues.call_args[1].keys())
for exp in expected_in_values:
self.assertIn(exp, args_list)
for nex in not_expected_in_values:
self.assertNotIn(nex, args_list)
# check result having correct variables.
self.assertIs(result._iterable_class, RecordIterable)
self.assertIsInstance(result._record, handlers.RecordDataclass)
self.assertIn('full_name', result._record_kwargs)
self.assertIn('street', result._record_kwargs)
self.assertIn('fixed', result._record_kwargs)
self.assertIn('post_process', result._record_kwargs)
self.assertNotIn('ignored', result._record_kwargs)
# not expected: values() keywords in _record_kwargs.
for nex in expected_in_values:
self.assertNotIn(nex, result._record_kwargs)
def test_records_iterator(self):
root = TestDataClass(id=0, name="Root", age=0, street='', parent=None)
def full_callback(data):
data['parent'] = root
return data
class FakeQuerySet:
class FakeQuery:
extra_select = []
values_select = ['id', 'name', 'street_id', 'one']
annotation_select = []
def get_compiler(self, db):
compiler = mock.MagicMock()
compiler.results_iter.return_value = [
[1, 'arthus', 12, 'One'],
]
return compiler
db = mock.MagicMock()
model = mock.MagicMock()
query = FakeQuery()
_record = handlers.RecordDataclass.wrap(TestDataClass)
_record_kwargs = {
'street': Ref('street_id', lambda pk: f'Street {pk}'),
'age': Val(18),
'name': Mut(lambda entry: entry.get('name').capitalize()),
'parent': PostProcess(full_callback),
}
iterable = RecordIterable(FakeQuerySet())
entry = next(iter(iterable))
self.assertEqual(entry.id, 1)
self.assertEqual(entry.name, 'Arthus')
self.assertEqual(entry.street, 'Street 12')
self.assertEqual(entry.parent, root)
class AdjunctTests(TestCase):
def test_ref_none(self):
r = Ref('key', None)
result = r.resolve(model=None, dbdata = {'key': 'Value'} )
self.assertEqual(r.adjunct, None)
self.assertEqual(result, "Value")