feat: initial import from blackmesa project

2025-01-20 21:31:51 +01:00 · 2025-01-20 21:31:51 +01:00 · e539aaf158
commit e539aaf158
parent 4e46bb90b0
10 changed files with 588 additions and 3 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,14 @@
 # Python-generated files
 __pycache__/
 *.py[oc]
 build/
 dist/
 wheels/
 *.egg-info
 # Virtual environments
 .venv
 # Specific to this repo
 uv.lock
 db.sqlite3
--- a/README.md
+++ b/README.md
@ -1,3 +1,79 @@
-# django_records
+# Django Records
-Django Records
+Django Records aims to provide a queryset extension facility, that allows to directly create structured data from querysets, other than Model Instances.
 An example is to create dataclass instances as entities for a clean architecture setup, where model instances should not persist in the business layer and are just readonly representations of the fetched data, especially if you plan e.g. to use django-agnostic repository facades.
 Main target is support of dataclass and pydantic, and allow building readonly representations of the data.
 ## The .records() queryset function
 > Changed since 0.3: records is not supposed take an optional first parameter as record target, instead use record_into(), however support for it is kept.
 `records()` fetches data from the database with a values() call, and uses that to directly create a dataclass, or any other such structure ("a record").
 It completely skips instantiation of a django model instance, and comes with tools to make it easy to handle initialization data, so that you can automate even production of immutable or nested data structures (called `Adjunct`), that work similar to tools like `Q()` or `F()`.
 Out of the box, records assumes, you want to use `dataclasses.dataclass` or a `pydantic.BaseModel`.
 Usage:
 ```python
    SomeModel.objects.filter(...).records('field1', 'field2', annotation=F(...), adjunct=Adjunct(...))
 ```
 ## Defining target default structure or a custom one with .record_into()
 > Unstable: record_into() might be renamed.
 - You can define the target class you want to create directly on the Queryset Manager (or Queryset)
  ```python
  @dataclass
  class Entity:
      id: int
  MyManager = BaseManager.from_queryset(RecordQuerySet)
  MyManager._default_record = Entity
  ...
  ```
 - You can add it to your model (I have not found a standard way to expand Meta yet)
 ```python
 class MyModel(models.Model):
    _default_record = Entity
 ```
 - You could also add the `Handler` directly as `_default_record`, otherwise if you want to use another Handler than the default dataclasses handler, you can define `_record_handler` on the Queryset
 - Finally you can override this behaviour by explicitly chaining the target class into the queryset with `record_into()`, which takes either a `RecordHandler` object, or any type of class that can be wrapped by one.
 ```python
    SomeModel.objects.filter(...).record_into(TargetDataClass).records('field1', 'field2')
 ```
 ## Adjuncts
 Just like Django Expressions can be used to annotate keys in the model that are retrieved, so can Adjuncts be used to circumvent this mechanic, and insert local data into your target class. You might want to use this, if e.g. the dataclass you create is _immutable_, or the dataclass you use has _required fields_, that need data when you create the class, but the data is not part of your database query.
 > Adjuncts do not influence the underlying SQL, except being able to add keys to the values() call.
 > The keys used in the kwargs to records() primarily represent the keys passed to the dataclass.
 - `FixedValue` simply carries some data and inserts it into every model. e.g. `.records(data=FixedValue(1))` will set the field `data` always to 1.
 - `MutValue` allows to use a callable as argument, which gets called when setting the field on the model. e.g. `.records(data=MutValue(lambda entry: 'x' in entry))`
 - `MutValueNotNone` same as `MutValue` but only applies the callable if the database value is not None (shortcut).
 - `Ref` uses a different key to retrieve the data from values, and may apply an Adjunct to it. This probably is the most used Adjunct in real life examples.
 - `Skip` allows you to skip a field. This is needed, as records() would include all fields on a dataclass, without knowing if it is optional, and helpful if you rewrite the fields with a PostProcess.
 - `PostProcess` allows you to call a function as a callback at creation - if the callback returns anything else than None, it is used as initializer for the production of the object.
 ## Testing
 ### Built-In Tests
 ...
 ### Integration Test: Examples Project
 The [celestial project](examples/celestials/README.md) in examples serves to demonstrate basic usage of records, as well as providing integration testing.
--- a/pyproject.toml
+++ b/pyproject.toml
@ -0,0 +1,19 @@
 [project]
 name = "django-records"
 version = "0.4.0"
 description = "Django structured querysets"
 readme = "README.md"
 authors = [{ name = "Gabor Körber", email = "gab@g4b.org" }]
 requires-python = ">=3.12"
 dependencies = ["django>=4"]
 [build-system]
 requires = ["hatchling"]
 build-backend = "hatchling.build"
 [project.optional-dependencies]
 pydantic = ["pydantic>=2"]
 [tool.uv.workspace]
 members = []
 exclude = ["examples"]
--- a/src/django_records/init.py
+++ b/src/django_records/init.py
@ -0,0 +1,2 @@
 def hello() -> str:
    return "Hello from django-records!"
--- a/src/django_records/adjuncts.py
+++ b/src/django_records/adjuncts.py
@ -0,0 +1,122 @@
 from abc import ABC
 from typing import Any, Callable
 class Adjunct(ABC):
    """
    Baseclass that defines the Adjunct interface.
    Any Adjunct data which does not translate into SQL, but rather adds data programmatically.
    Adjuncts are filtered by records() and act as mutators of the retrieved database value
    before creating the target record. 
    They can also provide a key to be included into the .values() call.
    """
    __slots__ = ['value']
    skip = False  # if skip is true, this adjunct will not be actually processed.
    resolves_field = True  # if resolves_field is true, this adjunct will be called for a single field with resolve()
    post_processing = False  # if post_processing is true, this adjunct will in the end be called with dbdata, and be able to manipulate the whole dictionary.
    def resolve(self, model, dbdata) -> Any | None:
        """
        resolve returns the field value for one entry.
        """
        raise NotImplementedError
    def post_process(self, model, dbdata) -> dict | None:
        """if you have post_processing on True, this needs to be implemented.
        has to return either a new dictionary to use in initialization of an object, or None.
        """
        raise NotImplementedError
    def values_field(self) -> str | tuple[str, str] | None:
        """
        return a field for the values operator.
        if you return None, it will not be added. (default)
        if you return a string, it will be added to values as args.
        if you return a tuple, it will be added to values as kwargs (key, value).
        """
        return
 class FixedValue(Adjunct):
    """always resolves to a fixed value."""
    def __init__(self, value=None):
        self.value = value
    def resolve(self, model, dbdata):
        return self.value
 class MutValue(Adjunct):
    """adjunct value that returns a field value with a callback.
        currently supports only 1 parameter (dbdata).
    """
    def __init__(self, callback):
        self.callback = callback if callable(callback) else None
    def resolve(self, model, dbdata):
        # at this point i could check if callback needs 0-2 arguments and decide the call.
        if self.callback:
            return self.callback(dbdata)
 class MutValueNotNone(MutValue):
    """MutValue that only calls the callback if the dbdata is not None
    (convenience function)
    """
    def resolve(self, model, dbdata):
        if dbdata is not None:
            return super().resolve(model, dbdata)
 class Ref(Adjunct):
    """
    Adds this key to the .values() call, and processes it with an adjunct or callback.
    If no adjunct is defined, it simply redirects the value as-is.
    -> key can be a tuple of (keyname, expression) for annotations. # TODO: test this bold claim.
    """
    __slots__ = ['adjunct', 'key']
    def __init__(self, key, adjunct: Adjunct | Callable | None = None):
        match adjunct:
            case Adjunct(adj): self.adjunct = adjunct
            case callback if callable(callback): self.adjunct = MutValueNotNone(callback)
            case _: self.adjunct = None
        self.key = key
    def resolve(self, model, dbdata):
        value = dbdata.get(self.key)
        if self.adjunct:
            return self.adjunct.resolve(model, value)
        else:
            return value
    def values_field(self):
        return self.key
 class Skip(Adjunct):
    """Skips this key from being retrieved from the database or used in the dataclass instantiation."""
    skip = True
    resolves_field = False
 class PostProcess(Adjunct):
    """calls a callback which can modify the whole initialization dictionary."""
    __slots__ = ['callback']
    resolves_field = False
    post_processing = True
    def __init__(self, callback):
        self.callback = callback
    def post_process(self, model, dbdata):
        if self.callback:
            return self.callback(dbdata)
--- a/src/django_records/errors.py
+++ b/src/django_records/errors.py
@ -0,0 +1,8 @@
 class DjangoRecordsException(Exception):
    ...
 class RecordClassDefinitionError(DjangoRecordsException):
    ...
 class RecordInstanceError(DjangoRecordsException):
    ...
--- a/src/django_records/handlers.py
+++ b/src/django_records/handlers.py
@ -0,0 +1,78 @@
 """
 RecordHandlers describe how a Record class should be built.
 Each Record Type should define one.
 Usually you just want to use one of the pre-built Handlers for dataclass, dict, or pydantic.
 """
 from .errors import RecordClassDefinitionError
 class RecordHandler:
    """
    Handler for a record type
    defines how a record can be created, and how to retrieve all field names, and the required ones.
    """
    __slots__ = ['klass']
    @classmethod
    def wrap(cls, klass):
        """internal factory function used to wrap a non-record handler into a record handler"""
        return cls(klass)
    def __init__(self, klass):
        self.klass = klass
    def create(self, **kwargs):
        """the actual creation of the underlying record type instance"""
        return self.klass(**kwargs)
    def get_field_names(self):
        """should return all field names of the wrapped record type."""
        return self.klass.__dict__.keys()
    @property
    def record(self):
        """property used to retrieve the wrapped class."""
        return self.klass
    @property
    def required_arguments(self):
        """property used that can filter for required field names"""
        return self.get_field_names()
 class RecordDict(RecordHandler):
    """RecordHandler that outputs a dictionary"""
    def __init__(self, klass=None):
        # it is not required to define dict, but you could do OrderedDict e.g.
        self.klass = klass or dict
    def get_field_names(self) -> list[str]:
        # dictionary has no required fields. any field is possible.
        return []
 class RecordDataclass(RecordHandler):
    """handles dataclasses.dataclass derivatives"""
    def create(self, **kwargs):
        # clean field names to be only valid if they are on the dataclass.
        record_fields = self.get_field_names()
        kwargs = {k: v for k, v in kwargs.items() if k in record_fields}
        return self.klass(**kwargs)
    def get_field_names(self) -> list[str]:
        # returns all field names, even those which are not required.
        # for dataclasses:
        if hasattr(self.klass, '__dataclass_fields__'):
            return list(self.klass.__dataclass_fields__.keys())
        # for pydantic BaseModel:
        if hasattr(self.klass, '__fields__'):
            return list(self.klass.__fields__.keys())
        raise RecordClassDefinitionError("Field Names not found.")
--- a/src/django_records/py.typed
+++ b/src/django_records/py.typed
--- a/src/django_records/records.py
+++ b/src/django_records/records.py
@ -0,0 +1,151 @@
 import logging
 from django.db.models import QuerySet, Model
 from django.db.models.expressions import BaseExpression, Combinable
 from django.db.models.manager import Manager
 from django.db.models.query import ValuesIterable
 from .adjuncts import Adjunct
 from .handlers import RecordDataclass, RecordHandler
 from .errors import RecordClassDefinitionError, RecordInstanceError
 logger = logging.getLogger(f"django_records.{__name__}")
 class RecordIterable(ValuesIterable):
    """
    Iterable returned by records() that yields a record class for each row.
    Replaces the standard iterable of the queryset.
    """
    def __iter__(self):
        queryset: QuerySet = self.queryset
        model: Model = self.queryset.model
        query = queryset.query
        compiler = query.get_compiler(queryset.db)
        record_data = getattr(queryset, '_record_kwargs', {})
        record_handler = queryset._record
        # extra(select=...) cols are always at the start of the row.
        names = [
            *query.extra_select,
            *query.values_select,
            *query.annotation_select,
        ]
        indexes = range(len(names))
        for row in compiler.results_iter(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size):
            dbdata = {names[i]: row[i] for i in indexes}
            # post-processors will be able to rewrite the whole dictionary.
            post_processors = []
            # we overwrite db data bluntly for now. actually we would provide callbacks the current dict.
            for k, v in record_data.items():
                if v.resolves_field:
                    dbdata[k] = v.resolve(model, dbdata)
                if v.post_processing:
                    post_processors.append(v)
            if post_processors:
                for processor in post_processors:
                    processed = processor.post_process(model, dbdata)
                    if processed is not None:
                        dbdata = processed
            try:
                record = record_handler.create(**dbdata)
            except Exception as e:
                raise RecordInstanceError("Error creating Record instance") from e
            yield record
 class RecordQuerySetMixin:
    _record_handler = RecordDataclass
    def record_into(self, handler):
        self._record = handler
        return self
    def records(self, *args, **kwargs):
        """
        generates record objects
        Acts like values(), however:
            - if record type is not defined with record_into(), you have to define it on the queryset, or the model, with _default_record,
              otherwise it will raise a RuntimeError.
            - keyword arguments of type "Adjunct" are used as deferred values, and resolved independently.
            - values() is called with every required_argument on the dataclass not handled by an Adjunct
        """
        if len(args) and not isinstance(args[0], str):
            # we assume this is our dataclass
            handler = args[0]
            args = args[1:]
            # @deprecate: we might remove this
            logger.warning("Defining the target class in args might be soon deprecated: %s", handler)
        else:
            handler = getattr(self, '_record', getattr(self, '_default_record', getattr(self.model, '_default_record', None)))
        if not handler:
            raise RecordClassDefinitionError("Trying records() on a Queryset without destination class.")
        if not isinstance(handler, RecordHandler):
            handler = self._record_handler.wrap(handler)
        all_keys = [*args, *kwargs.keys()]
        unhandled_keys = list(set(handler.required_arguments) - set(all_keys))
        args = [*args, *unhandled_keys]
        # rebuild keyword arguments for values, by filtering out our adjuncts
        new_kw = {}
        adjuncts = {}
        for k, v in kwargs.items():
            if isinstance(v, Adjunct):
                # skip allows an adjunct to completely ignore a key.
                if not v.skip:
                    adjuncts[k] = v
                # check if we have to add to values. adjuncts can define a field to add here.
                add_to_values = v.values_field()
                if isinstance(add_to_values, str) and add_to_values not in args:
                    args.append(add_to_values)
                elif isinstance(add_to_values, tuple):
                    new_kw[add_to_values[0]] = add_to_values[1]
            elif isinstance(v, BaseExpression) or isinstance(v, Combinable) or hasattr(v, 'resolve_expression'):
                new_kw[k] = v
            elif v is None:
                # ignore None
                pass
            else:
                # this will fail in values() for now, but i do not want to hijack future django functionality here.
                # however it would be just funky if we actually replace this with new_kw[k] = Val(v).
                new_kw[k] = v
        # copy ourself with values() and save the results on the cloned queryset values produces.
        try:
            values = self.values(*args, **new_kw)
        except Exception as e:
            raise RecordInstanceError("Error with calculated values") from e
        values._iterable_class = RecordIterable
        values._record_kwargs = adjuncts
        values._record = handler
        return values
 class RecordQuerySet(RecordQuerySetMixin, QuerySet):
    # overwrite cloning. I would love to have a way to inject this into django directly (or use model.Meta)
    def _clone(self):
        c = super()._clone()
        for key in ['_record', # saves the actual final handler until the iterator is consumed
                    '_record_kwargs', # saves the actual kwargs to records until the iterator is consumed
                    '_record_handler', # if the default handler to transform target classes, by default dataclasses
                    '_default_record', # the default target class for this particular model
                    ]:
            if hasattr(self, key):
                setattr(c, key, getattr(self, key))
        return c
 # Alternative:
 # class RecordManager(BaseManager.from_queryset(RecordQuerySet)):
 #    pass
 class RecordManager(RecordQuerySetMixin, Manager):
    def get_queryset(self):
        return RecordQuerySet(self.model, using=self._db)
--- a/src/django_records/tests.py
+++ b/src/django_records/tests.py
@ -0,0 +1,115 @@
 from dataclasses import dataclass
 from unittest import mock, TestCase
 from django.db.models import F
 from . import handlers
 from .adjuncts import MutValue as Mut, FixedValue as Val, Skip, PostProcess, Ref
 from .records import RecordIterable, RecordQuerySetMixin
@dataclass
 class TestDataClass:
    id: int
    name: str
    age: int
    street: str
    parent: 'TestDataClass' = None
 class TestRecords(TestCase):
    def test_records_basic(self):
        lam = lambda entry: entry.get('name')
        ref = lambda pk: f'referenced: {pk}'
        cb = lambda entry: {**entry, **{'new': 'field'}}
        MockedValues = mock.MagicMock()
        values_return = mock.MagicMock(return_value=[{'id': 1, 'name': 'Name', 'age': 18, 'street_id': 2, 'two': 'Two', 'one': 'One'}])
        MockedValues.return_value = values_return
        qs = RecordQuerySetMixin()
        qs.values = MockedValues
        result = qs.records(
            TestDataClass,
            'one',
            two=F('field'),
            full_name=Mut(lam),
            street=Ref('street_id', ref),
            ignored=None,
            fixed=Val(1),
            parent=Skip(),
            post_process=PostProcess(cb),
        )
        # what we expect in the values call is:
        expected_in_values = [
            'one',
            'two',
            'id',
            'name',
            'age',
            'street_id',
        ]
        not_expected_in_values = ['full_name', 'street', 'ignored', 'fixed', 'parent', 'post_process']
        args_list = list(MockedValues.call_args[0]) + list(MockedValues.call_args[1].keys())
        for exp in expected_in_values:
            self.assertIn(exp, args_list)
        for nex in not_expected_in_values:
            self.assertNotIn(nex, args_list)
        # check result having correct variables.
        self.assertIs(result._iterable_class, RecordIterable)
        self.assertIsInstance(result._record, handlers.RecordDataclass)
        self.assertIn('full_name', result._record_kwargs)
        self.assertIn('street', result._record_kwargs)
        self.assertIn('fixed', result._record_kwargs)
        self.assertIn('post_process', result._record_kwargs)
        self.assertNotIn('ignored', result._record_kwargs)
        # not expected: values() keywords in _record_kwargs.
        for nex in expected_in_values:
            self.assertNotIn(nex, result._record_kwargs)
    def test_records_iterator(self):
        root = TestDataClass(id=0, name="Root", age=0, street='', parent=None)
        def full_callback(data):
            data['parent'] = root
            return data
        class FakeQuerySet:
            class FakeQuery:
                extra_select = []
                values_select = ['id', 'name', 'street_id', 'one']
                annotation_select = []
                def get_compiler(self, db):
                    compiler = mock.MagicMock()
                    compiler.results_iter.return_value = [
                        [1, 'arthus', 12, 'One'],
                    ]
                    return compiler
            db = mock.MagicMock()
            model = mock.MagicMock()
            query = FakeQuery()
            _record = handlers.RecordDataclass.wrap(TestDataClass)
            _record_kwargs = {
                'street': Ref('street_id', lambda pk: f'Street {pk}'),
                'age': Val(18),
                'name': Mut(lambda entry: entry.get('name').capitalize()),
                'parent': PostProcess(full_callback),
            }
        iterable = RecordIterable(FakeQuerySet())
        entry = next(iter(iterable))
        self.assertEqual(entry.id, 1)
        self.assertEqual(entry.name, 'Arthus')
        self.assertEqual(entry.street, 'Street 12')
        self.assertEqual(entry.parent, root)
 class AdjunctTests(TestCase):
    def test_ref_none(self):
        r = Ref('key', None)
        result = r.resolve(model=None, dbdata = {'key': 'Value'} )
        self.assertEqual(r.adjunct, None)
        self.assertEqual(result, "Value")
		`@ -0,0 +1,2 @@`
							`def hello() -> str:`
							`return "Hello from django-records!"`