# locimend

locimend is a tool that corrects DNA sequencing errors using Deep Learning.

The goal is to provide a correct DNA sequence, when a sequence containing errors is provided.

It provides both a command-line program and a REST API.

## Technologies

- Tensorflow
- Biopython
- FastAPI

## Installation

This project uses [Nix](https://nixos.org/) to ensure reproducible
builds.

1.  Install Nix (compatible with MacOS, Linux and
    [WSL](https://docs.microsoft.com/en-us/windows/wsl/about)):

```bash
curl -L https://nixos.org/nix/install | sh
```

2.  Clone the repository:

```bash
git clone https://git.coolneng.duckdns.org/coolneng/locimend
```

3.  Change the working directory to the project:

```bash
cd locimend
```

4.  Enter the nix-shell:

```bash
nix-shell
```

5. Install the dependencies via poetry:

```bash
poetry install
```

After running these commands, you will find yourself in a shell that
contains all the needed dependencies.

## Usage

### Training the model

The following command creates the trains the Deep Learning model and shows the accuracy and AUC:

```bash
poetry run python locimend/main.py train <data file> <label file>
```

- <data file>: FASTQ file containing the sequences with errors
- <label file>: FASTQ file containing the sequences without errors

Both files must contain the canonical and read simulated sequences in the same positions (same row).

A dataset is provided to train the model, in order to proceed execute the following command:

```bash
poetry run python locimend/main.py train data/curesim-HVR.fastq data/HVR.fastq
```


### Inference

A trained model is provided, which can be used to infer the correct sequences. There are two ways to interact with it:

- Command-line execution
- REST API

#### Command-line

The following command will infer the correct sequence, and print it:

```bash
poetry run python locimend/main.py infer "<DNA sequence>"
```

#### REST API

It is also possible to serve the model via a REST API, to start the web server run the following command:

```bash
poetry run api
```

The API can be accessed at http://localhost:8000, with either a GET or POST request:

| Request | Endpoint | Payload |
|:----:|:-----:|:-----:|
| GET     | / | Sequence as a path parameter (in the URL) |
| POST     | /| JSON |

For a POST request the JSON must have the following structure:

```json
{"sequence": "<DNA sequence>"}
```