Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 2 - Basic Requirements | Conrad Akunga

This is Part 2 of a series on Designing, Building & Packaging A Scalable, Testable .NET Open Source Component.

Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 1 - Introduction
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 2 - Basic Requirements (This Post)
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 3 - Project Setup
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 4 - Types & Contracts
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 5 - Component Implementation
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 6 - Mocking & Behaviour Tests
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 7 - Sequence Verification With Moq
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 8 - Compressor Implementation
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 9 - Encryptor Implementation
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 10 - In Memory Storage
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 11 - SQL Server Storage
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 12 - PostgreSQL Storage
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 13 - Database Configuration
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 14 - Virtualizing Infrastructure
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 15 - Test Organization
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 16 - Large File Consideration
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 17 - Large File Consideration On PostgreSQL
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 18 - Azure Blob Storage
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 19 - Testing Azure Blob Storage Locally
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 20 - Amazon S3 Storage
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 21 - Testing Amazon S3 Storage Locally
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 22 - Refactoring Azure Storage Engine For Initializationinitialization
Designing, Building & Packaging A Scalable, Testable .NET Open Source Component - Part 23 - Refactoring Amazon Storage Engine For Initialization

Our last post was an introduction to this series.

This post will look at defining the basic requirements.

As explained earlier, I have a pet project that will require the uploading of files (PDFs to be precise), storage and then processing. These files may require re-processing, so I cannot discard them after completion.

They therefore need to be stored.

Introduction

This is the rationale of what we are building.

We will build a component that facilitates the following:

Uploading (storage) of a file
Retrieval (download) of a file
Deletion of a file

Upon upload of a file, we need to generate some sort of identifier that can be used in the application.

Also, we probably will need to store some metadata to make it easier to implement some functionality in the application - for example a page to view file details, icon, etc.

This metadata will include:

File name
File size (in bytes)
Extension (Will need this to know how to render the file if being viewed by the browser)
Date Uploaded
File Hash (Hash to detect changes to the file (for whatever reason). Also to tell if this file has been uploaded before)

We can then improve this component by performing some operations before persistence. At present these will include:

Compression - whenever possible, cut down on storage
Encryption - in this age of hackers and mistakes, better encrypt the file contents in case the storage is ever breached.

With regards to storage, this component should support the following:

File system - the files will be stored on a folder in the server
Database - the files will be stored as BLOBs on the database. Preliminary support will be for SQL Server first, and then PostgreSQL
Cloud BLOB storage - the files will be stored as BLOB objects in the cloud. Preliminary support will be for Azure and Amazon.

The component itself should support dependency injection, and should be configurable at this point in terms of:

Storage, & settings
Compression & settings
Encryption & settings

The dependency injection requirement will make it easy to use for

APIs
Web applications
Console applications
Service applications

We will build it in such a way to make it extensible so that it will be easy to support:

Other databases - MySQL, SQLite
Other BLOB storage providers - Google, Dreamhost, Hetzner, Heroku

Finally, some (preliminary) deliberate decisions

Uniqueness

If you upload two files with the same name, the system will treat them as different and store both and give you two different IDs. We will not make any effort to detect and prevent duplicates (either by file name, or by contents)

Context

Files are usually uploaded with some context - e.g. an upload file will belong to the logged in user. This component will make no effort to preserve this - that will be responsibility of the application. The component will purely deal with the file alone.

Changes Of Settings

Given we are going to support encryption and compression, it will probably be a good idea to persist whatever encryption algorithm and compression algorithm were used at the point of storage as part of the metadata. This way should we need to change them, updating existing files will be much easier. It will look repetitive, but this is an acceptable choice to balance future changes.

Hashing

We will use SHA

In our next post we shall setup our project and start the preliminary work.

Encryption

We will use AES

Compression

We will use Zip compression

File IDs

We will use Guid as file IDs

Update

There will be no support for update. To update, delete the existing and upload the replacement.

TLDR

This post outlines the requirements we want to address with the proposed software.

Happy hacking!