Blog

Functional Programming in Python

08 Aug, 2018
Xebia Background Header Wave

Functional programming (FP) is a paradigm in where a program is composed of functions. A function is a building block that encapsulates a computation. A function applied with a value, always returns the same computed value. FP avoids mutating state. FP allows developers to create powerful processing pipelines. Most programming languages have support for functional programming, including Python. Lets take a look how FP in Python works.

Functions and Python

Python uses the lambda keyword to define a function. For example, when we type f = lambda x: x + 1 we can define
a function called f, that when applied with a value, returns x + 1:

In [1]: f = lambda x: x + 1

In [2]: f(1)
Out[2]: 2

Function Composition

Function composition is about combining functions. A combined function has the computational properties of both. For example, when we define two functions f and g, we can compose them to function h, where h has the properties of both.

In [1]: f = lambda x: x + 1

In [2]: g = lambda x: x + 2

In [3]: h = lambda x: f(g(x))

In [4]: h(1)
Out[4]: 4

Pure and Impure Values

FP starts to shine when it operates on values that express a certain type of value. Most programs operate on pure values. Examples of pure values are the number 1, the text hello, or a value like True or False. FP allows to express the result of a computation as a value. Such a value is impure because it expresses an effect of a computation. Examples of impure values are Success(123), Failure('First name is empty') or None.

Reasoning about Impure Values

Python has support for impure values. An example of an impure value is Optional. A method returns an Optional value to express that a value is not always returned. The method can return a value 1 or a value None. There is a Pythonic way to process such a value. Use an if statement to check for the effect and process the value:

In [1]: def return_value_optionally(x: int):
   ...:     if x <= 3:
   ...:         return x
   ...:     else:
   ...:         return None
   ...:

In [2]: return_value_optionally(1)
Out[2]: 1

In [3]: x = return_value_optionally(5)

In [4]: if x:
   ...:     print(f'pure value: {x}')
   ...: else:
   ...:     print(f'impure value {x}')
   ...:
impure value None

The Pythonic way of processing optional values is not ideal. Python uses statements and mutates values. FP can help because functions always return a value, avoiding state mutation. Another benefit is leveraging function composition that allows for chaining functions. Chaining functions create processing pipelines and lower developer cognitive overhead. Processing pipelines are easy to reason about.

Higher Order Functions

Before we can create pipelines, we need to look at higher order functions (HoF). HoF are functions, that receive a function as an argument or return a function as the result. A simple concept that we already know from function composition:

In [1]: f = lambda x: x + 1

In [2]: g = lambda x: x + 2

In [3]: h = lambda f, g: lambda x: f(g(x))

In [4]: i = h(f, g)

In [5]: i(1)
Out[5]: 4

Functional Data Structures

Before we create pipelines, we need to look at Functional Data Structures (FDS). FDS support HoF and allows us to create pipelines. Pipelines allow us to process impure values. The standard library of Python does not contain FDS. I have created a small library that contains the most common FDS structures. The library is called python-fp, and is available at Github. The library is available in PyPI, the Python Package Index.
Lets look how we can process an Optional value in a more functional way. We use fp.option.Option, which is a FDS that supports processing impure values as a pipeline:

In [1]: from fp.option import Option

In [2]: Option(1)\
    .map(lambda x: x + 1)
Out[2]: Option(2)

In [3]: Option(1)\
    .filter(lambda x: x > 1)
Out[3]: Option(None)

In [4]: Option(1)\
    .bind(lambda x: Option(x + 2))
Out[4]: Option(3)

In [5]: Option.empty()\
    .map(lambda x: x + 1)
Out[5]: Option(None)

In [6]: Option(1)\
    .map(lambda x: x + 1)\
    .fold(0, lambda x: x)
Out[6]: 2

In [7]: Option(1)\
    .map(lambda x: x + 1)\
    .filter(lambda x: x > 4)\
    .fold(0, lambda x: x)
Out[7]: 0

The examples always return a value. The FDS exposes HoF that we use to create pipelines. Pipelines are a chain of operations that operate on the value that the FDS contains. Depending on the state of the FDS, the chain of computations is evaluated or not.
To convert an impure value to a pure value, we need to use the fold function. Fold converts an impure value to a pure value. Depending on the FDS type, we make a decision what to do if the FDS is empty. For example, what pure value do we wish to return when the Optional value is None. Will it be a zero value or something else? I decided to return a zero when the value is None, else the data structure returns the pure value it contains.

List Operations

A list is also an impure value. It expresses the effect of having zero or more elements. We can use fp.list.List FDS to create pipelines based on lists:

In [1]: from fp.list import List

In [2]: List(1, 2, 3)\
    .map(lambda x: x + 1)
Out[2]: List(2, 3, 4)

In [3]: List(1, 2, 3)\
    .filter(lambda x: x > 1)
Out[3]: List(2, 3)

In [4]: List(1, 2, 3)\
    .bind(lambda x: List(x + 1, x + 2, x + 3))
Out[4]: List(2, 3, 4, 3, 4, 5, 4, 5, 6)

In [5]: List.empty()\
    .map(lambda x: x + 1)
Out[5]: List()

In [6]: List(1, 2, 3)\
    .fold(0, lambda c, e: c + e)
Out[6]: 6

In [7]: List(1, 2, 3).sum()
Out[7]: 6

In [8]: List(1, 2, 3)\
    .intersperse(0)
Out[8]: List(1, 0, 2, 0, 3)

In [9]: List('the', 'book', 'is', 'green')\
    .mk_string(',')
Out[9]: 'the,book,is,green'

The List FDS always returns a value. Depending on the pipeline we create, the List will return an impure value or a pure value. To create a pure value use fold or mk_string. To return impure values use map, bind or head_option.

A List of Effects

When we have a list of impure values, we can create a pipeline to combine the impure values. The result is also an impure value.

In [1]: from fp.list import List

In [2]: from fp.option import Option

In [3]: from fp.sequence import OptionSequence

In [4]: xs = List(Option(1), Option.empty(), \
    Option(2), Option.empty(), Option(3))

In [5]: OptionSequence.sequence(xs)
Out[5]: Option(List(1, 2, 3))

Lets say we have a list of values that represent the result of a validation. The list contains all impure values. Some represent empty values, and other represent pure values. The operation OptionSequence, invert the nested FDS. The List of Options converts to an Option of List, that contains only pure values.
We can reason further and compute the pure value using a fold operation:

In [6]: result = OptionSequence.sequence(xs)
Out[6]: Option(List(1, 2, 3))

In [7]: ys = result\
    .fold(List.empty(), lambda x: x)
Out[7]: List(1, 2, 3)

In [8]: ys.fold(0, lambda c, e: c + e)
Out[8]: 6

Validating Values

The fp.validation.Validation FDS is great for validating user input. Combined with Option, it expresses missing values. Combined with Regex, it validates user input:

In [1]: from fp.validation import Validation

In [2]: from fp.option import Option

In [3]: import re

In [4]: Validation.from_option(Option.empty(), 'No value')
Out[4]: Failure(err_value='No value', failure=True, success=False)

In [5]: Validation.from_option(Option(1), 'No value')
Out[5]: Success(value=1, failure=False, success=True)

In [6]: Validation.lift(-20, lambda x: x <= 0, 'Number should be positive')
Out[6]: Failure(err_value='Number should be positive', failure=True, success=False)

In [7]: Validation.lift("dennis", lambda x: re.match('[A-Z]*', x), 'name should be all uppercase')
Out[7]: Failure(err_value='name should be all uppercase', failure=True, success=False)

A List of Validation Values

Create a processing pipeline to process a list of Validation results:

In [1]: from fp.list import List

In [2]: from fp.validation import Validation

In [3]: from fp.sequence import ValidationSequence

In [4]: fn = Validation.failure('First name is empty')

In [5]: ln = Validation.failure('Last name is empty')

In [6]: age = Validation.success(27)

In [7]: zipcode = Validation.failure('Invalid zip code')

In [8]: results = ValidationSequence.sequence(List(fn, ln, age, zipcode))

In [9]: results.fold(lambda x: x, lambda x: x)
Out[9]: List(First name is empty, Last name is empty, Invalid zip code)

In [10]: results.fold(lambda x: x, lambda x: x).mk_string(',')
Out[10]: 'First name is empty,Last name is empty,Invalid zip code'

ValidationSequence constructs and returns a list of errors when there are validation errors:

In [1]: from fp.list import List

In [2]: from fp.validation import Validation

In [3]: from fp.sequence import ValidationSequence

In [4]: ValidationSequence.sequence(\
    List(Validation.success('Dennis'), \
    Validation.success('Vriend'), \
    Validation.success('43')))
Out[4]: Success(value=List(Dennis, Vriend, 43), failure=False, success=True)

In [5]: result = ValidationSequence.sequence(List(Validation.success('Dennis'), Validation.success('Vriend'), Validation.success('43')))

In [6]: result\
    .fold(lambda x: x, lambda x: x)\
    .mk_string(', ')
Out[6]: 'Dennis, Vriend, 43'

Processing Boto3 errors Functional Style

The Validation FDS catches exceptions that are raised by eg. the Boto3 library. Use the from_try_catch HoF. Boto3 raises exceptions when operations fail. When deleting a bucket, the operation can fail. Validation catches the error as a Validation value. The List FDS aggregates all the errors:

In [1]: import boto3

In [2]: from fp.validation import Validation

In [3]: from fp.list import List

In [4]: from fp.sequence import ValidationSequence

In [5]: client = boto3.client('s3')

In [6]: result = List('foo', 'bar', 'baz')\
    .map(lambda name: Validation\
        .from_try_catch(lambda: client.delete_bucket(Bucket=name)))
Out[7]: List(\
    Failure(err_value=ClientError('An error occurred (AccessDenied) when calling the DeleteBucket operation: Access Denied'), failure=True, success=False), 
    Failure(err_value=ClientError('An error occurred (AccessDenied) when calling the DeleteBucket operation: Access Denied'), failure=True, success=False), 
    Failure(err_value=ClientError('An error occurred (AccessDenied) when calling the DeleteBucket operation: Access Denied'), failure=True, success=False))

In [8]: errors = ValidationSequence.sequence(result)
Out[8]: Failure(err_value=\
    List(\
        An error occurred (AccessDenied) when calling the DeleteBucket operation: Access Denied, \
        An error occurred (AccessDenied) when calling the DeleteBucket operation: Access Denied, \
        An error occurred (AccessDenied) when calling the DeleteBucket operation: Access Denied), failure=True, success=False)

In [9]: errors.fold(lambda x: x, lambda x: x).mk_string(',')
Out[9]: 'An error occurred (AccessDenied) when calling the DeleteBucket operation: Access Denied,\
    An error occurred (AccessDenied) when calling the DeleteBucket operation: Access Denied,\
    An error occurred (AccessDenied) when calling the DeleteBucket operation: Access Denied'

Refactoring Pythonic code

The following example uses Pythonic style to return a security group. The predicate looks for tcp ports that give access to port 1337 from all addresses:

from typing import *

def inclusive_port_1337(permission: dict) -> bool:
    contains_tcp_or_all = permission.get('IpProtocol', '') in ['-1', 'tcp']
    can_access_port_1337 = permission.get('FromPort', 65536) <= 1337 <= permission.get('ToPort', -1)
    contains_cidr_all = list(filter(lambda r: r['CidrIp'] == '0.0.0.0/0', permission.get('IpRanges', [])))
    return contains_tcp_or_all and can_access_port_1337 and contains_cidr_all            

def all_ip4_access_to_port_1337(sg: dict) -> Optional[dict]:
    return sg if list(filter(lambda p: inclusive_port_1337, sg['IpPermissions'])) else None

An FP refactor would look like:

from fp.option import Option
from fp.list import List

def contains_tcp_or_all(permission: dict) -> bool:
    return Option(permission.get('IpProtocol')) \
        .filter(lambda x: x in ['-1', 'tcp']) \
        .is_defined()

def can_access_port_1337(permission: dict) -> bool:
    from_port = Option(permission.get('FromPort')).get_or_else(65536)
    to_port = Option(permission.get('ToPort')).get_or_else(-1)
    return from_port <= 1337 <= to_port

def contains_cidr_all(permission: dict) -> bool:
    return Option(List.from_list(permission.get('IpRanges'))) \
        .get_or_else(List.empty) \
        .filter(lambda x: x['CidrIp'] == '0.0.0.0/0') \
        .non_empty()

def filter_tcp_all_access_to_1337(permission: dict) -> bool:
    return contains_tcp_or_all(permission) \
           and can_access_port_1337(permission) \
           and contains_cidr_all(permission)

def all_ip4_access_to_port_1337(sg: dict) -> List[dict]:
    return List.from_list(sg.get('IpPermissions')) \
        .filter(filter_tcp_all_access_to_1337) \
        .bind(lambda x: List.from_list(x['IpRanges']).map(lambda x: x['CidrIp']))

After refactoring, every function returns a value. Each function is explicit in the values it accepts and returns. It is clear what the intention is of the pipelines. The code can be easily extended. Every function can be tested in isolation.
The following code shows a Pythonic way to aggregate data using a nested list:

def aggregate_all_tasks_for_cluster_arn() -> List[dict]:
    all_tasks = []
    for cluster_arn in list_clusters():
        for task_arn in list_tasks(cluster_arn):
            task = describe_task(task_arn)
            if task:
                if task['lastStatus'] == 'RUNNING' and task['taskDefinitionArn']:
                    all_tasks.append(task)
    return all_tasks

The code uses the list all_tasks to aggregate the results via mutation. Lets refactor the code to FP:

from fp.list import List
from fp.option import Option


def list_clusters() -> List[str]:
    return List('arn:aws:ecs:us-east-1:0000000000:cluster/dev', 'arn:aws:ecs:us-east-1:0000000000:cluster/test')


def list_tasks(cluster_arn: str) -> List[str]:
    tasks_per_cluster = {
        'arn:aws:ecs:us-east-1:0000000000:cluster/dev': List('arn:aws:ecs:<region>:0000000000:task/c5cba4eb-5dad-405e-96db-71ef8eefe6a8'),
        'arn:aws:ecs:us-east-1:0000000000:cluster/test': List('arn:aws:ecs:<region>:0000000000:task/067ef33c-b084-44ad-8217-26512c6c7845')
    }
    return Option(tasks_per_cluster.get(cluster_arn)).get_or_else(List.empty())


def describe_task(task_arn: str) -> Option[dict]:
    task_definition_per_task_arn = {
        'arn:aws:ecs:<region>:0000000000:task/c5cba4eb-5dad-405e-96db-71ef8eefe6a8': { 'lastStatus': 'STOPPED' },
        'arn:aws:ecs:<region>:0000000000:task/067ef33c-b084-44ad-8217-26512c6c7845': { 'lastStatus': 'RUNNING', 'taskDefinitionArn': 'arn' }
    }
    return Option(task_definition_per_task_arn.get(task_arn))


def filter_for_running_tasks_with_task_def_arn(task: dict) -> bool:
    return Option(task.get('lastStatus')).exists(lambda x: x == 'RUNNING') \
        and Option(task.get('taskDefinitionArn')).is_defined()


def aggregate_all_tasks_for_cluster_arn() -> List[dict]:
    return list_clusters()\
        .bind(list_tasks)\
        .bind(lambda x: describe_task(x).fold(List.empty, lambda x: List(x)))\
        .filter(filter_for_running_tasks_with_task_def_arn)

After refactoring, the code is more modular and expresses its intent clearly. All functions always return a value and can be chained together by means of pipelines.

Conclusion

Functional programming is a paradigm that can be used with Python. Solving problems in a functional way results in simple but powerful processing pipelines. Create pipelines using Functional Data Structures, Higher Order Functions and functions. Pipelines are a chain of functions that always return a value. Pipelines can process both pure and impure values. Use fold to let a pipeline return a pure value.
With a few lines of code we can express a powerful processing pipelines that are easy to reason about. I use this style of problem solving a lot. FP saves me time, reduce complexity, and results in modular code. I can always test my code and reuse functions in other settings. Now you can as well!

Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts