Functional vs Object-oriented approaches to validation
As demonstrated in Python
Functional programming is often described in terms of its contrast with object-oriented programs; that is, you write functions that act on data instead of objects that wrap data and use methods to act on themselves. Functional programming wonks (like me) will tell you that writing code this way is generally better than OO, but I don’t want to do that (right now).
However, in this post, I’m not here to argue either side. Today, I’m just going to demonstrate a few equivalent approaches to the same problem: validating data.
Say we want to write a form-validation-and-cleaning routine. We are given an incoming data structure, and must apply a set of rules to it, returning the (cleaned) data structure if everything went well, and an itemized list of errors if not. We don’t want to short-circuit our code; if 2 fields are incorrect, we want to know about both of them.
Since we’re working in a language that expects exceptions, we’ll allow our external interface to use them. So, for all examples, we’ll define a function validate_form
that accepts our data and a list of validators, raises an exception containing all the errors if there are any, and returns the data otherwise.
OO Implementation
Let’s start with an example object-oriented approach, which I happen to think does the job ok:
class ValidationError(Exception):
def __init__(self, message, errors={}):
self.errors = errors
class Validator(object):
def __init__(self, field_name):
self.field_name = field_name
def validate(self, data):
raise Exception('Please implement validate')
class ValidatorSuite(Validator):
def __init__(self):
self.validators = []
def add_validator(self, validator):
self.validators.append(validator)
def validate(self, data):
errors = {}
for validator in self.validators:
try:
data = validator.validate(data)
except ValidationError as e:
errors[validator.field_name] = e.message
if len(errors) > 0:
raise ValidationError("Validation failed", errors=errors)
return data
# Specific Validators
class NonBlankValidator(Validator):
def validate(self, data):
s = data.get(self.field_name)
if not isinstance(s, str) or len(s) == 0:
raise ValidationError(
"Field '{}' must not be blank".format(self.field_name))
return data
class DefaultValidator(Validator):
def __init__(self, field_name, default):
self.field_name = field_name
self.default = default
def validate(self, data):
data[self.field_name] = data.get(self.field_name, self.default)
return data
# Our public API
def validate_form(data, validators):
suite = ValidatorSuite()
for v in validators:
suite.add_validator(v)
return suite.validate(data)
This is pretty good; let’s see if rewriting this code in a functional handling style saves us any trouble, and then we’ll have some comparative discussion.
The reason functional programming folks don’t like exceptions is that they really wreak havoc on the flow of execution. We’d rather assemble our program out of defined functions that accept values and return values, and never jump around. Djikstra had some strong words for goto
blocks in C because they make it more difficult than necessary to follow the flow of data through your program; the exact same thing is true of exceptions.
class Failure(object):
def __init__(self, errors):
self.errors = errors
# Specific Validators
def non_blank_validator(field_name):
def validate(data):
if not isinstance(data, str) or len(data) == 0:
return Failure({
field_name: "Field '{}' must not be blank".format(field_name)
})
return data
return validate
def default_validator(field_name, default_val):
def validate(data):
data[field_name] = data.get(field_name, default_val)
return data
return validate
def validation_suite(validators):
def validate(data):
errors = {}
for v in validators:
val = v(data)
if isinstance(val, Failure):
errors = dict(val.errors, **errors)
else:
data = val
if len(errors) > 0:
return Failure(errors)
return data
return validate
# Our public API
def validate_form(data, validators, return_error=False):
val = validation_suite(validators)(data)
if isinstance(val, Failure):
if return_error:
return val
else:
raise ValidationError("Validation Failed", errors=val.errors)
return val
In the OO example, validators were classes with a constructor that accepted the values required, and a method validate
than actually performed the validation. In the functional version, validate
is a closure around the required values, with a consistant signature. If you’re not used to writing code with closures you might not like the style I’ve chosen for the validators, but I don’t find it difficult to read or understand.
The main thing we’ve introduced here is Failure
, a sort of flag value. Our only constraint on validator functions is that they must return an instance of Failure
if they fail. This removes the need for us to raise exceptions. However, we can take this further.
Another Functional Style
This one has a twist, but I’ll save the reveal until after. Here’s the code:
class ValidatedData(dict):
def __init__(self, data=None, errors=None):
self['data'] = data or {}
self['errors'] = errors or {}
def run(self, *validator_fns):
result = self
for fn in validator_fns:
result = result.merge(fn(result['data']))
return result
def merge(self, other):
self['data'] = dict(self['data'], **other['data'])
self['errors'] = dict(self['errors'], **other['errors'])
return self
def success(data):
return ValidatedData(data=data)
def fail(field_name, error):
return ValidatedData(errors={field_name: error})
def non_blank_validator(field_name):
def validate(data):
s = data.get(field_name)
if not isinstance(s, str) or len(s) == 0:
return fail(field_name, "Field '{}' must not be blank".format(field_name))
return success(data)
return validate
def default_validator(field_name, default_val):
def _inner(data):
data[field_name] = data.get(field_name, default_val)
return success(data)
return validate
# Public API
class ValidationError(Exception):
def __init__(self, message, errors={}):
self.errors = errors
def validate_form(data, validators):
result = ValidatedData(data).run(*validators)
if len(result['errors']) > 0:
raise ValidationError("Validation Failed", errors=result['errors'])
return result['data']
In this example, our validators accept a raw dict as before, but return a wrapped object we’ve called ValidatedData
. ValidatedData is (in effect) a monad, with functions that return monadic values and run
filling in for bind
(I didn’t feel the need to be strict about the semantics in Python). But don’t worry, the code still works if you don’t know that.
I prefer the way the monad works over both of the validation suite functions above. We’ve abstracted away all that business in favor of something more generic. I also felt clever for inheriting from dict
, but that’s not really necessary.
Overall this came out a bit longer than the other functional version, but only just. Most of that is the explicit success
and fail
functions, which I think became necessary as our expected return value became more complex.
Comparison
Talking is all well and good, but let’s compare some situations where we want to work with our code.
Writing a new validator
Let’s look at what it takes to add a validator. We’ll skip the actual implementation of the fiddly bits so we can just look at the patterns and differences side-by-side.
# Common functions. TODO: implement
def email_valid(email):
return True
def email_domain_equals(email, domain):
return True
# OO-Style
class EmailValidator(Validator):
def __init__(self, field_name, domain):
self.field_name = field_name
self.domain = domain
def validate(self, data):
email = data.get(self.field_name)
if not email_valid(email):
raise ValidationError("Invalid email address.")
elif not email_domain_equals(email, self.domain):
raise ValidationError("Email must have domain {}".format(self.domain))
return data
# Functional Style
def email_validator(field_name, domain):
def validate(data):
email = data.get(field_name)
if not email_valid(email):
return Failure({field_name: "Invalid email address."})
elif not email_domain_equals(email, domain):
raise Failure({field_name: "Email must have domain {}".format(domain)})
return data
return validate
# Monadic Style
def email_validator(field_name, domain):
def validate(data):
email = data.get(field_name)
if not email_valid(email):
fail(field_name, "Invalid email address.")
elif not email_domain_equals(email, domain):
fail(field_name, "Email must have domain {}".format(domain)})
return data
return validate
Not much changed here. The Monadic version benefits from the addition of the fail
function, but it’s basically equivalent to the OO version. The Class-based validator must remember to store the incoming values in the constructor, which is something that the other two don’t need to worry about – in that way, I think the functional versions are a bit simpler (provided you’re comfortable with first-class functions, of course).
Running a Suite (without the external function)
Let’s take a look at that code side-by-side:
# OO Version
def validate_form(data, validators):
suite = ValidatorSuite()
for v in validators:
suite.add_validator(v)
return suite.validate(data)
# Functional Version
def validate_form(data, validators):
val = validation_suite(validators)(data)
if isinstance(val, Failure):
if return_error:
return val
else:
raise ValidationError("Validation Failed", errors=val.errors)
return val
# Monadic Version
def validate_form(data, validators):
result = ValidatedData(data).run(*validators)
if len(result['errors']) > 0:
raise ValidationError("Validation Failed", errors=result['errors'])
return result['data']
I could complain about the way the suite uses the add_validator
pattern, but that would be pretty disengenuous given that I wrote it. Honestly, since the OO version matches the spec here we set out from the get-go, I’d have to give it the edge. But wait!
Nested Validation
This should be fun. Let’s say that we want to validate that data['person']['name']
is not blank.
# Object-Oriented
class SuiteValidator(Validator):
def __init__(self, field_name, suite):
self.field_name = field_name
self.suite = suite
def validate(self, data):
try:
data[self.field_name] = self.suite.validate(data.get(self.field_name, {}))
except ValidationError as e:
raise ValidationError(errors=e.errors)
return data
suite = ValidatorSuite()
suite.add_validator(NonBlankValidator('name'))
suite.add_validator(NonBlankValidator('email'))
outerSuite = ValidatorSuite()
outerSuite.add_validator(SuiteValidator('person', suite))
try:
print outerSuite.validate({'person': {'email': 'test@test.com'}})
except Exception as e:
print e.errors
# Functional
def nested_validator(field_name):
def validate(data):
suite = validation_suite([
non_blank_validator('email'),
non_blank_validator('name'),
])
result = suite(data.get(field_name, {}))
if isinstance(result, Failure):
return Failure({field_name: result.errors})
return result
return validate
validation_suite([nested_validator('person')])({'person': {'email': 'test@test.com'}})
# => Failure(errors={'person': {'name': "Field 'name' must not be blank."}})
# Monadic
def nested_validator(field_name, validators):
def validate(data):
result = ValidatedData(data.get('person')).run(*validators)
if len(result['errors']) > 0:
return fail(field_name, result['errors'])
return success({field_name: result['data']})
return validate
ValidatedData({'person': {'email': 'test@test.com'}}).run(
nested_validator('person', [
non_blank_validator('name'),
non_blank_validator('email')
]))
# {'errors': {'person': {'name': "Field 'name' must not be blank"}}, 'data': {'person': {'email': 'test@test.com'}}}
I like the monad best again – all the nested validator has to do is unpack the returned monad and construct a returned one.
Note that the OO code couldn’t be made to do this without changing the implementation of ValidationSuite to collect the errors properly. This is a bit of a self-serving point, so take it as you will, but I think it shows that the functional options are a bit more generic/flexible (even if the OO version could be refactored pretty easily). It wasn’t on purpose, honest! So right now, the OO version only remembers one error for each nested field.
Words of warning
All of these techniques will work in any language with the following features:
- Classes (or typeclasses or objects)
- Exceptions
- First-class functions
So, Python, Ruby, Javascript & friends, Java 8, Scala, Clojure (naturally), C#, F#, Caml, and many many more.
However, as is always the case using functional techniques in not-necessarily-functional-languages, you should exercise caution. Whether you’re writing an open-source project or working with a team, you need to be sure that your code fits the contextually-appropriate definition of “idiomatic”. And if you’re writing a library, at the very least you should assure that it can be used in the common way – this is why all of the above contains an interface that throws an exception.
I don’t believe that any of the above implementations are too strange to qualify as idiomatic python, but your mileage may vary. Do you have a tale of stylistic culture clash?