Update trained model

Show total training time
Update README
2021-07-07 01:46:57 +02:00 · 2021-07-07 01:19:26 +02:00 · 2021-07-07 01:13:35 +02:00
6 changed files with 76 additions and 3 deletions
--- a/README.md
+++ b/README.md
@@ -2,10 +2,15 @@

 locimend is a tool that corrects DNA sequencing errors using Deep Learning.

+The goal is to provide a correct DNA sequence, when a sequence containing errors is provided.
+
+It provides both a command-line program and a REST API.
+
 ## Technologies

 - Tensorflow
 - Biopython
+- FastAPI

 ## Installation

@@ -48,8 +53,59 @@ contains all the needed dependencies.

 ## Usage

-The following command creates the dataset, trains the Deep Learning model and shows the accuracy:
+### Training the model
+
+The following command creates the trains the Deep Learning model and shows the accuracy and AUC:

 ```bash
-poetry run python src/model.py
+poetry run python locimend/main.py train <data file> <label file>
 ```
+
+- <data file>: FASTQ file containing the sequences with errors
+- <label file>: FASTQ file containing the sequences without errors
+
+Both files must contain the canonical and read simulated sequences in the same positions (same row).
+
+A dataset is provided to train the model, in order to proceed execute the following command:
+
+```bash
+poetry run python locimend/main.py train data/curesim-HVR.fastq data/HVR.fastq
+```
+
+
+### Inference
+
+A trained model is provided, which can be used to infer the correct sequences. There are two ways to interact with it:
+
+- Command-line execution
+- REST API
+
+#### Command-line
+
+The following command will infer the correct sequence, and print it:
+
+```bash
+poetry run python locimend/main.py infer "<DNA sequence>"
+```
+
+#### REST API
+
+It is also possible to serve the model via a REST API, to start the web server run the following command:
+
+```bash
+poetry run api
+```
+
+The API can be accessed at http://localhost:8000, with either a GET or POST request:
+
+| Request | Endpoint | Payload |
+|:----:|:-----:|:-----:|
+| GET     | / | Sequence as a path parameter (in the URL) |
+| POST     | /| JSON |
+
+For a POST request the JSON must have the following structure:
+
+```json
+{"sequence": "<DNA sequence>"}
+```
+
--- a/locimend/main.py
+++ b/locimend/main.py
@@ -1,7 +1,8 @@
 from asyncio import run
 from argparse import ArgumentParser, Namespace
+from time import time

-from model import infer_sequence, train_model
+from locimend.model import infer_sequence, train_model


 def parse_arguments() -> Namespace:
@@ -21,7 +22,10 @@ def parse_arguments() -> Namespace:

 async def execute_task(args):
    if args.task == "train":
+        start_time = time()
        train_model(data_file=args.data_file, label_file=args.label_file)
+        end_time = time()
+        print(f"Training time: {end_time - start_time}")
    else:
        prediction = await infer_sequence(sequence=args.sequence)
        print(f"Error-corrected sequence: {prediction}")
--- a/trained_model/keras_metadata.pb
+++ b/trained_model/keras_metadata.pb
--- a/trained_model/saved_model.pb
+++ b/trained_model/saved_model.pb
--- a/trained_model/variables/variables.data-00000-of-00001
+++ b/trained_model/variables/variables.data-00000-of-00001
--- a/trained_model/variables/variables.index
+++ b/trained_model/variables/variables.index
Author	SHA1	Message	Date
coolneng	ed6433f063	Update trained model	2021-07-07 01:46:57 +02:00
coolneng	fda7f7ed5f	Show total training time	2021-07-07 01:19:26 +02:00
coolneng	2ea8000657	Update README	2021-07-07 01:13:35 +02:00