{ "cells": [ { "cell_type": "markdown", "metadata": { "nbsphinx": "hidden" }, "source": [ "
\n", " `21 Microservices Architecture Patterns Study
\n", "\n", "

Chapter 3. Coupling and Abstractions

\n", "\n", "
\n", " Joseph Kim <cloudeyes@gmail.com>
\n", " Jab 6. 2021
\n", "
\n", "\n", "
\n", " \n", "
\n", "\n", "
\n", " Download Jupyter Notebook\n", "
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# 03. Coupling and Abstractions\n", "\n", "## Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "TDD 문제 풀이 사이트 \n", "\n", "- https://kata-log.rocks/tdd\n", "- https://www.programmingwithwolfgang.com/tdd-kata/\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Motivation\n", "\n", "- what makes a good abstraction?\n", "- What do we want from abstractions?\n", "- And how do they relate to testing?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Global coupling is harmful\n", "\n", "- Locally, coupling is a good thing: it’s a sign that our code is working together\n", " > \"high cohesion\" between the coupled elements.\n", "- But, \"globally\" it harms maintainability\n", " - increasing the risk and the cost of changing our code\n", " - until we are no longer able to effectively change our systems.\n", "\n", "#### Abstracting details fixes the coupling" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Lots of Coupling**" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "jupyter": { "source_hidden": true }, "tags": [ "hidden-input" ] }, "outputs": [ { "data": { "image/svg+xml": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " System\n", " A \n", " System\n", " B \n", " \n", "" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%ditaa \n", "+--------+ +--------+\n", "| | ---> | |\n", "| System | ---> | System |\n", "| A | ---> | B |\n", "| | ---> | |\n", "| | ---> | |\n", "+--------+ +--------+" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Less Coupling**" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "jupyter": { "source_hidden": true }, "tags": [ "hidden-input" ] }, "outputs": [ { "data": { "image/svg+xml": [ "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " System\n", " A \n", " Abstraction\n", " System\n", " B \n", " \n", "" ], "text/plain": [ "" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%%ditaa\n", "+--------+ +--------+\n", "| | /-------------\\ | |\n", "| System | ---> | | ---> | System |\n", "| A | ---> | Abstraction | ---> | B |\n", "| | | | ---> | |\n", "| | \\-------------/ | |\n", "+--------+ +--------+" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Abstracting state Aids Testability" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Example: synchronizing two directories\n", "\n", "- If a file \"exists\" in the source\n", " - **Rule 1:** then if it's not in the destination, copy the file over.\n", " - else it's in the destination but with a different name, \n", " - **Rule 2:** rename the destination file to match.\n", "- **Rule 3:** If a file exists in the destination but not in the source, remove it.\n", "\n", "\"Existence\" of files should be checked by \"hash value\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 1. \"Hashing\" a file to check identity\n", "\n", "Generate a SHA-1 hash from a file to compare files." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "BLOCKSIZE = 65536\n", "from pathlib import Path\n", "import hashlib\n", "\n", "def hash_file(path: Path) -> str:\n", " hasher = hashlib.sha1()\n", " with path.open(\"rb\") as file:\n", " buf = file.read(BLOCKSIZE)\n", " while buf:\n", " hasher.update(buf)\n", " buf = file.read(BLOCKSIZE)\n", " return hasher.hexdigest()" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "c80e5f5cc2afaa7fa9e149f1f40b6165237c28ba\n", "✅ \u001b[95mtest_hash_file\u001b[0m\n" ] } ], "source": [ "@test\n", "def test_hash_file():\n", " import os\n", " os.makedirs('./build', exist_ok=True)\n", " inpath = Path('03-coupling-and-abstractions.ipynb')\n", " outpath = Path('./build') / inpath\n", " \n", " with open(inpath) as fin:\n", " with open(outpath, 'w+') as fout:\n", " fout.write(fin.read())\n", " \n", " print(h1 := hash_file(inpath))\n", " assert h1 == hash_file(outpath) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### 2. Basic sync algorithm\n", "\n", "Implement the business logic in this way:\n", "- Start with a solution to the smallest part of the problem.\n", "- And then iteratively make the solution richer and better designed." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Make a sample test files" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[01;34mtests/ex1\u001b[00m\n", "├── \u001b[01;34msource\u001b[00m\n", "│   ├── my-file\n", "│   └── special-file\n", "└── \u001b[01;34mtarget\u001b[00m\n", " ├── renamed-my-file\n", " └── your-file\n", "\n", "2 directories, 4 files\n" ] } ], "source": [ "!rm -rf tests/ex1 \\\n", "&& mkdir -p tests/ex1/{source,target} \\\n", "&& echo \"I am a very useful file\" > tests/ex1/source/my-file \\\n", "&& echo \"I am a special file\" > tests/ex1/source/special-file \\\n", "&& echo \"You are a very useful file\" > tests/ex1/target/your-file \\\n", "&& echo \"I am a very useful file\" > tests/ex1/target/renamed-my-file \\\n", "&& tree tests/ex1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Create a test first that fails" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"\", line 2, in sync\n", " raise NotImplementedError\n", "NotImplementedError\n" ] } ], "source": [ "def sync(source: str, dest: str) -> None:\n", " raise NotImplementedError\n", "\n", "@test\n", "def test_when_a_file_exists_in_the_source_but_not_the_destination():\n", " sync('tests/ex1/source', 'tests/ex1/target')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Play with Jupyter notebook to build some scaffolds" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tests/ex1 []\n", "tests/ex1/source ['special-file', 'my-file']\n", "tests/ex1/target ['renamed-my-file', 'your-file']\n" ] } ], "source": [ "for folder, dirs, files in os.walk('tests/ex1'): print(folder, _, files)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "({'82133f2e33bcea9bbadf74c5e2438e5e0db07a33': 'special-file',\n", " '4a17979d33f38b7c36e38a6b02013b2f2ef01fbb': 'my-file'},\n", " {'4a17979d33f38b7c36e38a6b02013b2f2ef01fbb': 'renamed-my-file',\n", " '197f216e99637a19a46c44ea849b4bb493e92275': 'your-file'})" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "src_path = Path('./tests/ex1/source')\n", "dst_path = Path('./tests/ex1/target')\n", "\n", "def hash_files(path: Path(folder)) -> dict[str, str]:\n", " hashes = dict[str, str]()\n", "\n", " for folder, _, files in os.walk(path):\n", " for fname in files:\n", " hashes[hash_file(Path(folder) / fname)] = fname\n", "\n", " return hashes\n", "\n", "src_hashes, dst_hashes = hash_files(src_path), hash_files(dst_path)\n", "src_hashes, dst_hashes" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(dict_keys(['82133f2e33bcea9bbadf74c5e2438e5e0db07a33', '4a17979d33f38b7c36e38a6b02013b2f2ef01fbb']),\n", " dict_keys(['4a17979d33f38b7c36e38a6b02013b2f2ef01fbb', '197f216e99637a19a46c44ea849b4bb493e92275']))" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "src_keys, dst_keys = src_hashes.keys(), dst_hashes.keys()\n", "src_keys, dst_keys" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'82133f2e33bcea9bbadf74c5e2438e5e0db07a33'}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "src_only = src_keys - dst_keys; src_only" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'197f216e99637a19a46c44ea849b4bb493e92275'}" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dst_only = dst_keys - src_keys; dst_only" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'4a17979d33f38b7c36e38a6b02013b2f2ef01fbb'}" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "in_both = src_keys & dst_keys; in_both" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Implement business rules\n", "\n", "**Rule 1**: Copy a file from to destination if the file exists **only** in the source." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[01;34mtests/ex1\u001b[00m\n", "├── \u001b[01;34msource\u001b[00m\n", "│   ├── my-file\n", "│   └── special-file\n", "└── \u001b[01;34mtarget\u001b[00m\n", " ├── renamed-my-file\n", " └── your-file\n", "\n", "2 directories, 4 files\n" ] } ], "source": [ "!tree tests/ex1" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "import shutil\n", "\n", "for key in src_only:\n", " file_name = src_hashes[key]\n", " src_file_path = src_path / file_name\n", " dst_file_path = dst_path / file_name\n", " shutil.copy(src_file_path, dst_file_path)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`special-file` should be copied to `target` folder." ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[01;34mtests/ex1\u001b[00m\n", "├── \u001b[01;34msource\u001b[00m\n", "│   ├── my-file\n", "│   └── special-file\n", "└── \u001b[01;34mtarget\u001b[00m\n", " ├── renamed-my-file\n", " ├── special-file\n", " └── your-file\n", "\n", "2 directories, 5 files\n" ] } ], "source": [ "!tree tests/ex1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Rule 3**: Remove a file from destination if it's not in the source, " ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [], "source": [ "for key in dst_only:\n", " file_name = dst_hashes[key]\n", " dst_file_path = dst_path / file_name\n", " try:\n", " os.remove(dst_file_path)\n", " except FileNotFoundError:\n", " pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`your-file` should be removed from `target` folder." ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[01;34mtests/ex1\u001b[00m\n", "├── \u001b[01;34msource\u001b[00m\n", "│   ├── my-file\n", "│   └── special-file\n", "└── \u001b[01;34mtarget\u001b[00m\n", " ├── renamed-my-file\n", " └── special-file\n", "\n", "2 directories, 4 files\n" ] } ], "source": [ "!tree tests/ex1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Rule 2**: *Rename the destination's file name to match* if the file \"exists\" both in the source and destination but with different names. " ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [], "source": [ "for key in in_both:\n", " src_name, dst_name = src_hashes[key], dst_hashes[key]\n", " dst_path_from = dst_path / dst_name\n", " dst_path_to = dst_path / src_name\n", " try:\n", " os.rename(dst_path_from, dst_path_to)\n", " except FileNotFoundError:\n", " pass" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[01;34mtests/ex1\u001b[00m\n", "├── \u001b[01;34msource\u001b[00m\n", "│   ├── my-file\n", "│   └── special-file\n", "└── \u001b[01;34mtarget\u001b[00m\n", " ├── my-file\n", " └── special-file\n", "\n", "2 directories, 4 files\n" ] } ], "source": [ "!tree tests/ex1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sync Completed! :D" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Build the real service function from the tryouts" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [], "source": [ "import hashlib\n", "import os\n", "import shutil\n", "from pathlib import Path\n", "\n", "def sync(source: str, dest: str) -> None:\n", " src_path = Path(source)\n", " dst_path = Path(dest)\n", "\n", " src_hashes, dst_hashes = hash_files(src_path), hash_files(dst_path)\n", " src_keys, dst_keys = src_hashes.keys(), dst_hashes.keys()\n", " \n", " src_only = src_keys - dst_keys\n", " dst_only = dst_keys - src_keys\n", " in_both = src_keys & dst_keys\n", " \n", " for key in src_only:\n", " file_name = src_hashes[key]\n", " src_file_path = src_path / file_name\n", " dst_file_path = dst_path / file_name\n", " shutil.copy(src_file_path, dst_file_path)\n", " \n", " for key in dst_only:\n", " file_name = dst_hashes[key]\n", " dst_file_path = dst_path / file_name\n", " try:\n", " os.remove(dst_file_path)\n", " except FileNotFoundError:\n", " pass\n", "\n", " for key in in_both:\n", " src_name, dst_name = src_hashes[key], dst_hashes[key]\n", " dst_path_from = dst_path / dst_name\n", " dst_path_to = dst_path / src_name\n", " try:\n", " os.rename(dst_path_from, dst_path_to)\n", " except FileNotFoundError:\n", " pass" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Create a general fixture function." ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[01;34mtests\u001b[00m\n", "└── \u001b[01;34mex1\u001b[00m\n", " ├── \u001b[01;34msource\u001b[00m\n", " └── \u001b[01;34mtarget\u001b[00m\n", " └── your-file\n", "\n", "3 directories, 1 file\n" ] } ], "source": [ "def make_test_files(base_dir, src_only=False, dst_only=False, renamed=False):\n", " base_path = Path(base_dir)\n", " src_path = base_path / 'source'\n", " dst_path = base_path / 'target'\n", " shutil.rmtree(base_dir, ignore_errors=True)\n", " os.makedirs(src_path, exist_ok=True)\n", " os.makedirs(dst_path, exist_ok=True)\n", " src_only and (src_path / 'special-file').write_text('I am a special file')\n", " dst_only and (dst_path / 'your-file').write_text('You are a very useful file')\n", " renamed and (src_path / 'my-file').write_text('I am a very useful file')\n", " renamed and (dst_path / 'renamed-my-file').write_text('I am a very useful file')\n", " \n", "make_test_files('tests/ex1', dst_only=True)\n", "!tree tests" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Run tests with the real service implementation." ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✅ \u001b[95mtest_when_a_file_exists_in_the_source_but_not_the_destination\u001b[0m\n" ] } ], "source": [ "@test\n", "def test_when_a_file_exists_in_the_source_but_not_the_destination():\n", " try:\n", " make_test_files('tests/ex1', src_only=True)\n", " sync('tests/ex1/source', 'tests/ex1/target')\n", " finally: # clenaup test files\n", " shutil.rmtree('tests/ex1', ignore_errors=True)\n", "\n", "!ls tests" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✅ \u001b[95mtest_when_a_file_has_been_renamed_in_the_source\u001b[0m\n" ] } ], "source": [ "@test\n", "def test_when_a_file_has_been_renamed_in_the_source(renamed=True):\n", " try:\n", " make_test_files('tests/ex1', src_only=True)\n", " sync('tests/ex1/source', 'tests/ex1/target')\n", " finally: # clenaup test files\n", " shutil.rmtree('tests/ex1', ignore_errors=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Lessons learned:\n", "\n", "Our domain logic:\n", "- “figure out the difference between two directories,” \n", "\n", "has problems such as:\n", "- it's *tightly coupled to the I/O code*. \n", " - We can’t run our difference algorithm without calling the `pathlib`, `shutil`, and `hashlib`.\n", "- we *haven’t written enough tests** even with our current requirements\n", "- which makes the code isn’t very extensible. Let's imagine:\n", " - add `--dry-run` flag that just simulate the sync\n", " - or to sync to a remote server, or to cloud storage?" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Choosing the Right Abstractions(s)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Rewrite our code to make it more testable.\n", " \n", "Find **simplifying abstractions** for *three distinct responsibilities* that the code has:\n", "\n", "1. **Determine hashes** for a series of paths from both the source and the destination.\n", "1. **Decide differences** whether each file is new, renamed, or redundant.\n", "1. **Do corresponding actions**: copy, move, or delete files to match the source.\n", "\n", "#### Simplified inputs and outputs in our tests (`test_sync.py`)" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Traceback (most recent call last):\n", " File \"\", line 10, in test_when_a_file_exists_in_the_source_but_not_the_destination\n", " assert expected_actions == actions\n", "AssertionError\n", "Traceback (most recent call last):\n", " File \"\", line 18, in test_when_a_file_exists_in_the_source_but_not_the_destination\n", " assert expected_actions == actions\n", "AssertionError\n" ] } ], "source": [ "from unittest.mock import MagicMock\n", "determine_actions = MagicMock()\n", "\n", "@test\n", "def test_when_a_file_exists_in_the_source_but_not_the_destination():\n", " src_hashes = {'hash1': 'fn1'}\n", " dst_hashes = {}\n", " expected_actions = [('copy', Path('/src/fn1'), Path('/dst/fn1'))]\n", " actions = list(determine_actions(src_hashes, dst_hashes, '/src', '/dst'))\n", " assert expected_actions == actions\n", " \n", "@test\n", "def test_when_a_file_exists_in_the_source_but_not_the_destination():\n", " src_hashes = {'hash1': 'fn1'}\n", " dst_hashes = {'hash1': 'fn2'}\n", " expected_actions = [('move', Path('/dst/fn2'), Path('/dst/fn1'))]\n", " actions = list(determine_actions(src_hashes, dst_hashes, '/src', '/dst'))\n", " assert expected_actions == actions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Implementing Our Chosen Abstractions\n", "\n", "Our goal is to isolate our \"core\" logic from the filesystem.\n", "- create “core” of code that has no dependencies on external state\n", "- see how it responds when we give it input from the outside world.\n", "\n", "See also:\n", "- [Functional Core, Imperative Shell by Gary Bernhardt](https://www.destroyallsoftware.com/screencasts/catalog/functional-core-imperative-shell)\n", "\n", "#### Split our code into three (`sync.py`)" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "def sync(source, dest):\n", " # step 1: imperative shell - gather inputs\n", " source_hashes = read_paths_and_hashes(source)\n", " dest_hashes = read_paths_and_hashes(dest)\n", "\n", " # step 2: call functional core\n", " actions = determine_actions(source_hashes, dest_hashes, source, dest)\n", "\n", " # step 3: imperative shell - apply outputs\n", " for action, *paths in actions:\n", " if action == 'copy':\n", " shutil.copyfile(*paths)\n", " if action == 'move':\n", " shutil.move(*paths)\n", " if action == 'delete':\n", " os.remove(paths[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### A function that just does I/O (`sync.py`)" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "def read_paths_and_hashes(root):\n", " hashes = {}\n", " for folder, _, files in os.walk(root):\n", " for fn in files:\n", " hashes[hash_file(Path(folder) / fn)] = fn\n", " return hashes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### A function that just does business logic (`sync.py`)" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [], "source": [ "HashDict = dict[str, str]\n", "def determine_actions(src_hashes: HashDict, dst_hashes: HashDict, \n", " src_folder: str, dst_folder: str):\n", " src_keys, dst_keys = src_hashes.keys(), dst_hashes.keys()\n", " src_path, dst_path = Path(src_folder), Path(dst_folder)\n", " \n", " src_only = src_keys - dst_keys\n", " dst_only = dst_keys - src_keys\n", " in_both = src_keys & dst_keys\n", " \n", " for key in src_only:\n", " file_name = src_hashes[key]\n", " src_file_path = src_path / file_name\n", " dst_file_path = dst_path / file_name \n", " yield 'copy', src_file_path, dst_file_path\n", " \n", " for key in dst_only:\n", " file_name = dst_hashes[key]\n", " dst_file_path = dst_path / file_name\n", " yield 'delete', dst_file_path\n", "\n", " for key in in_both:\n", " src_name, dst_name = src_hashes[key], dst_hashes[key]\n", " dst_path_from = dst_path / dst_name\n", " dst_path_to = dst_path / src_name\n", " yield 'move', dst_path_from, dst_path_to" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✅ \u001b[95mtest_when_a_file_exists_in_the_source_but_not_the_destination\u001b[0m\n", "✅ \u001b[95mtest_when_a_file_exists_in_the_source_but_not_the_destination\u001b[0m\n" ] } ], "source": [ "@test\n", "def test_when_a_file_exists_in_the_source_but_not_the_destination():\n", " src_hashes = {'hash1': 'fn1'}\n", " dst_hashes = {}\n", " expected_actions = [('copy', Path('/src/fn1'), Path('/dst/fn1'))]\n", " actions = list(determine_actions(src_hashes, dst_hashes, '/src', '/dst'))\n", " assert expected_actions == actions\n", " \n", "@test\n", "def test_when_a_file_exists_in_the_source_but_not_the_destination():\n", " src_hashes = {'hash1': 'fn1'}\n", " dst_hashes = {'hash1': 'fn2'}\n", " expected_actions = [('move', Path('/dst/fn2'), Path('/dst/fn1'))]\n", " actions = list(determine_actions(src_hashes, dst_hashes, '/src', '/dst'))\n", " assert expected_actions == actions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Testing Edge to Edge with Fakes and Dependency Injection\n", "\n", "Instead, we often write tests that invoke a whole system together but fake the I/O, sort of *edge to edge*:\n", "\n", "#### Explicit dependencies (`sync.py`)\n", "\n", "It now exposes two new dependencies, a `reader` and a `filesystem`.\n", "- `reader`: produce our files hashes.\n", "- `filesystem`: apply the changes we detect." ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "def sync_dirs(reader, filesystem, source: str, dest: str):\n", "\n", " # step 1: imperative shell - gather inputs\n", " source_hashes = reader(source)\n", " dest_hashes = reader(dest)\n", "\n", " # step 2: call functional core\n", " actions = determine_actions(source_hashes, dest_hashes, source, dest)\n", "\n", " # step 3: imperative shell - apply outputs\n", " for action, *paths in actions:\n", " if action == 'copy':\n", " filesystem.copy(paths[0], paths[1])\n", " if action == 'move':\n", " filesystem.move(paths[0], paths[1])\n", " if action == 'delete':\n", " filesystem.delete(paths[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Tests using DI" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [], "source": [ "class FakeFileSystem(list): \n", " \"\"\"An example of spy object.\n", " \n", " Extend list to collection operations on fake filesystem.\n", " \"\"\"\n", " \n", " def copy(self, src, dest):\n", " self.append(('copy', src, dest))\n", "\n", " def move(self, src, dest):\n", " self.append(('move', src, dest))\n", "\n", " def delete(self, dest):\n", " self.append(('delete', src, dest))" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✅ \u001b[95mtest_when_a_file_exists_in_the_source_but_not_the_destination\u001b[0m\n", "✅ \u001b[95mtest_when_a_file_has_been_renamed_in_the_source\u001b[0m\n" ] } ], "source": [ "@test\n", "def test_when_a_file_exists_in_the_source_but_not_the_destination():\n", " source = {\"sha1\": \"my-file\" }\n", " dest = {}\n", " filesystem = FakeFileSystem()\n", "\n", " reader = {\"/source\": source, \"/dest\": dest}\n", " sync_dirs(reader.pop, filesystem, \"/source\", \"/dest\")\n", "\n", " assert filesystem == [(\"copy\", Path(\"/source/my-file\"), Path(\"/dest/my-file\"))]\n", "\n", "@test\n", "def test_when_a_file_has_been_renamed_in_the_source():\n", " source = {\"sha1\": \"renamed-file\" }\n", " dest = {\"sha1\": \"original-file\" }\n", " filesystem = FakeFileSystem()\n", "\n", " reader = {\"/source\": source, \"/dest\": dest}\n", " sync_dirs(reader.pop, filesystem, \"/source\", \"/dest\")\n", "\n", " assert filesystem == [(\"move\", Path(\"/dest/original-file\"), Path(\"/dest/renamed-file\"))]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Pros and Cons: \n", "\n", "- Pros: tests act on the exact same function that’s used by our production code\n", "- Cons: we have to make our stateful components explicit and pass them around\n", "\n", "[“Test-induced design damage” by David Heinemeier Hansson](https://dhh.dk/2014/test-induced-design-damage.html)\n", "\n", "> But that's a far ways off from declaring that hard-to-unit-test code is always poorly designed, and always in need of repair. That you cannot have well-designed code that is hard to unit test.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Why Not Just Patch It Out?\n", "\n", "“Why don’t you just use mock.patch and save yourself the effort?\"” \n", "The author thinks:\n", "- Mocking frameworks, particularly monkeypatching, are a code smell.\n", "- A Better way is:\n", " - to clearly identify the responsibilities in codebases, \n", " - to separate those responsibilities into small, focused objects \n", " - that are easy to replace with a test double.\n", "\n", "Reasons why:\n", "\n", "- Patching does nothing to improve the design. \n", " > e.g) using `mock.patch` won’t let your code work with a `--dry-run` flag, nor will it help you run against an FTP server. For that, you’ll need to introduce abstractions.\n", "- Mocks tend to be more coupled to the implementation details of the codebase. \n", "- Overuse of mocks leads to complicated test suites that fail to explain the code\n", "- Designing for testability really means designing for extensibility.\n", " - Trade off *a little more complexity* for a cleaner design that admits novel use cases.\n", "\n", "*See also*:\n", "\n", "- [Test Double (explained by Martin Fowler)](https://martinfowler.com/bliki/TestDouble.html#:~:text=Test%20Double%20is%20a%20generic,used%20to%20fill%20parameter%20lists.)\n", "\n", "> a generic term for any case where you replace a production object for testing purposes. There are various kinds of double that Gerard lists:\n", "> \n", "> - **Dummy** objects are passed around but never actually used. Usually they are just used to fill parameter lists.\n", "> - **Fake** objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an InMemoryTestDatabase is a good example).\n", "> - **Stubs** provide canned answers to calls made during the test, usually not responding at all to anything outside what's programmed in for the test.\n", "> - **Spies** are stubs that also record some information based on how they were called. One form of this might be an email service that records how many messages it was sent.\n", "> - **Mocks** are pre-programmed with expectations which form a specification of the calls they are expected to receive. They can throw an exception if they receive a call they don't expect and are checked during verification to ensure they got all the calls they were expecting." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Mocks versus Fakes\n", "\n", "- **Mocks** are used to *verify how* something gets used.\n", " - have methods like `assert_called_once_with()`. (associated with London-school TDD)\n", "- **Fakes** are *working implementations* of the thing they’re replacing.\n", " - designed for use only in tests: they wouldn’t work “in real life”.\n", " > our in-memory repository is a good example. But you can use them to make assertions about the end state of a system rather than the behaviors along the way, so they’re associated with classic-style TDD.\n", " \n", "#### Mock examples:\n", "\n", "- https://docs.python.org/3/library/unittest.mock.html\n", "- [Python Mocking 101: Fake It Before You Make It](https://www.fugue.co/blog/2016-02-11-python-mocking-101)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "✅ \u001b[95mtest_when_a_file_exists_in_the_source_but_not_the_destination\u001b[0m\n" ] } ], "source": [ "from unittest import mock\n", "from unittest.mock import call, MagicMock\n", "\n", "@test\n", "@mock.patch('__main__.read_paths_and_hashes', MagicMock(side_effect={\n", " \"/source\": {\"sha1\": \"my-file\" },\n", " \"/dest\": {},\n", "}.get))\n", "@mock.patch('shutil.copyfile', MagicMock())\n", "def test_when_a_file_exists_in_the_source_but_not_the_destination():\n", " sync(\"/source\", \"/dest\")\n", " read_paths_and_hashes.assert_has_calls([mock.call('/source'), mock.call('/dest')])\n", " assert [call(Path('/source/my-file'), Path('/dest/my-file'))] \\\n", " == shutil.copyfile.mock_calls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*See also*:\n", "\n", "- [“Mocks Aren’t Stubs” by Martin Folwer](https://martinfowler.com/articles/mocksArentStubs.html)\n", "- [What are the London and Chicago schools of TDD? - Software Engineering Stack Exchange](https://softwareengineering.stackexchange.com/questions/123627/what-are-the-london-and-chicago-schools-of-tdd)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Wrap-Up" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Finding the right abstraction is tricky\n", "- A few heuristics and questions to ask yourself:\n", " - Can I choose a familiar Python data structure to represent the state of the messy system and then try to imagine a single function that can return that state?\n", " - Where can I draw a line between my systems, where can I carve out a seam to stick that abstraction in?\n", " - What is a sensible way of dividing things into components with different responsibilities? What implicit concepts can I make explicit?\n", " - What are the dependencies, and what is the core business logic?" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.1" } }, "nbformat": 4, "nbformat_minor": 4 }