Compare commits

...

3 Commits

Author SHA1 Message Date
ed6433f063 Update trained model 2021-07-07 01:46:57 +02:00
fda7f7ed5f Show total training time 2021-07-07 01:19:26 +02:00
2ea8000657 Update README 2021-07-07 01:13:35 +02:00
6 changed files with 76 additions and 3 deletions

View File

@@ -2,10 +2,15 @@
locimend is a tool that corrects DNA sequencing errors using Deep Learning.
The goal is to provide a correct DNA sequence, when a sequence containing errors is provided.
It provides both a command-line program and a REST API.
## Technologies
- Tensorflow
- Biopython
- FastAPI
## Installation
@@ -48,8 +53,59 @@ contains all the needed dependencies.
## Usage
The following command creates the dataset, trains the Deep Learning model and shows the accuracy:
### Training the model
The following command creates the trains the Deep Learning model and shows the accuracy and AUC:
```bash
poetry run python src/model.py
poetry run python locimend/main.py train <data file> <label file>
```
- <data file>: FASTQ file containing the sequences with errors
- <label file>: FASTQ file containing the sequences without errors
Both files must contain the canonical and read simulated sequences in the same positions (same row).
A dataset is provided to train the model, in order to proceed execute the following command:
```bash
poetry run python locimend/main.py train data/curesim-HVR.fastq data/HVR.fastq
```
### Inference
A trained model is provided, which can be used to infer the correct sequences. There are two ways to interact with it:
- Command-line execution
- REST API
#### Command-line
The following command will infer the correct sequence, and print it:
```bash
poetry run python locimend/main.py infer "<DNA sequence>"
```
#### REST API
It is also possible to serve the model via a REST API, to start the web server run the following command:
```bash
poetry run api
```
The API can be accessed at http://localhost:8000, with either a GET or POST request:
| Request | Endpoint | Payload |
|:----:|:-----:|:-----:|
| GET | / | Sequence as a path parameter (in the URL) |
| POST | /| JSON |
For a POST request the JSON must have the following structure:
```json
{"sequence": "<DNA sequence>"}
```

View File

@@ -1,7 +1,8 @@
from asyncio import run
from argparse import ArgumentParser, Namespace
from time import time
from model import infer_sequence, train_model
from locimend.model import infer_sequence, train_model
def parse_arguments() -> Namespace:
@@ -21,7 +22,10 @@ def parse_arguments() -> Namespace:
async def execute_task(args):
if args.task == "train":
start_time = time()
train_model(data_file=args.data_file, label_file=args.label_file)
end_time = time()
print(f"Training time: {end_time - start_time}")
else:
prediction = await infer_sequence(sequence=args.sequence)
print(f"Error-corrected sequence: {prediction}")

File diff suppressed because one or more lines are too long

Binary file not shown.