# locimend locimend is a tool that corrects DNA sequencing errors using Deep Learning. The goal is to provide a correct DNA sequence, when a sequence containing errors is provided. It provides both a command-line program and a REST API. ## Technologies - Tensorflow - Biopython - FastAPI ## Installation This project uses [Nix](https://nixos.org/) to ensure reproducible builds. 1. Install Nix (compatible with MacOS, Linux and [WSL](https://docs.microsoft.com/en-us/windows/wsl/about)): ```bash curl -L https://nixos.org/nix/install | sh ``` 2. Clone the repository: ```bash git clone https://git.coolneng.duckdns.org/coolneng/locimend ``` 3. Change the working directory to the project: ```bash cd locimend ``` 4. Enter the nix-shell: ```bash nix-shell ``` 5. Install the dependencies via poetry: ```bash poetry install ``` After running these commands, you will find yourself in a shell that contains all the needed dependencies. ## Usage ### Training the model The following command creates the trains the Deep Learning model and shows the accuracy and AUC: ```bash poetry run python locimend/main.py train <data file> <label file> ``` - <data file>: FASTQ file containing the sequences with errors - <label file>: FASTQ file containing the sequences without errors Both files must contain the canonical and read simulated sequences in the same positions (same row). A dataset is provided to train the model, in order to proceed execute the following command: ```bash poetry run python locimend/main.py train data/curesim-HVR.fastq data/HVR.fastq ``` ### Inference A trained model is provided, which can be used to infer the correct sequences. There are two ways to interact with it: - Command-line execution - REST API #### Command-line The following command will infer the correct sequence, and print it: ```bash poetry run python locimend/main.py infer "<DNA sequence>" ``` #### REST API It is also possible to serve the model via a REST API, to start the web server run the following command: ```bash poetry run api ``` The API can be accessed at http://localhost:8000, with either a GET or POST request: | Request | Endpoint | Payload | |:----:|:-----:|:-----:| | GET | / | Sequence as a path parameter (in the URL) | | POST | /| JSON | For a POST request the JSON must have the following structure: ```json {"sequence": "<DNA sequence>"} ```