Compare commits
3 Commits
2b280e2bdf
...
0.1.0
| Author | SHA1 | Date | |
|---|---|---|---|
|
ed6433f063
|
|||
|
fda7f7ed5f
|
|||
|
2ea8000657
|
60
README.md
60
README.md
@@ -2,10 +2,15 @@
|
||||
|
||||
locimend is a tool that corrects DNA sequencing errors using Deep Learning.
|
||||
|
||||
The goal is to provide a correct DNA sequence, when a sequence containing errors is provided.
|
||||
|
||||
It provides both a command-line program and a REST API.
|
||||
|
||||
## Technologies
|
||||
|
||||
- Tensorflow
|
||||
- Biopython
|
||||
- FastAPI
|
||||
|
||||
## Installation
|
||||
|
||||
@@ -48,8 +53,59 @@ contains all the needed dependencies.
|
||||
|
||||
## Usage
|
||||
|
||||
The following command creates the dataset, trains the Deep Learning model and shows the accuracy:
|
||||
### Training the model
|
||||
|
||||
The following command creates the trains the Deep Learning model and shows the accuracy and AUC:
|
||||
|
||||
```bash
|
||||
poetry run python src/model.py
|
||||
poetry run python locimend/main.py train <data file> <label file>
|
||||
```
|
||||
|
||||
- <data file>: FASTQ file containing the sequences with errors
|
||||
- <label file>: FASTQ file containing the sequences without errors
|
||||
|
||||
Both files must contain the canonical and read simulated sequences in the same positions (same row).
|
||||
|
||||
A dataset is provided to train the model, in order to proceed execute the following command:
|
||||
|
||||
```bash
|
||||
poetry run python locimend/main.py train data/curesim-HVR.fastq data/HVR.fastq
|
||||
```
|
||||
|
||||
|
||||
### Inference
|
||||
|
||||
A trained model is provided, which can be used to infer the correct sequences. There are two ways to interact with it:
|
||||
|
||||
- Command-line execution
|
||||
- REST API
|
||||
|
||||
#### Command-line
|
||||
|
||||
The following command will infer the correct sequence, and print it:
|
||||
|
||||
```bash
|
||||
poetry run python locimend/main.py infer "<DNA sequence>"
|
||||
```
|
||||
|
||||
#### REST API
|
||||
|
||||
It is also possible to serve the model via a REST API, to start the web server run the following command:
|
||||
|
||||
```bash
|
||||
poetry run api
|
||||
```
|
||||
|
||||
The API can be accessed at http://localhost:8000, with either a GET or POST request:
|
||||
|
||||
| Request | Endpoint | Payload |
|
||||
|:----:|:-----:|:-----:|
|
||||
| GET | / | Sequence as a path parameter (in the URL) |
|
||||
| POST | /| JSON |
|
||||
|
||||
For a POST request the JSON must have the following structure:
|
||||
|
||||
```json
|
||||
{"sequence": "<DNA sequence>"}
|
||||
```
|
||||
|
||||
|
||||
@@ -1,7 +1,8 @@
|
||||
from asyncio import run
|
||||
from argparse import ArgumentParser, Namespace
|
||||
from time import time
|
||||
|
||||
from model import infer_sequence, train_model
|
||||
from locimend.model import infer_sequence, train_model
|
||||
|
||||
|
||||
def parse_arguments() -> Namespace:
|
||||
@@ -21,7 +22,10 @@ def parse_arguments() -> Namespace:
|
||||
|
||||
async def execute_task(args):
|
||||
if args.task == "train":
|
||||
start_time = time()
|
||||
train_model(data_file=args.data_file, label_file=args.label_file)
|
||||
end_time = time()
|
||||
print(f"Training time: {end_time - start_time}")
|
||||
else:
|
||||
prediction = await infer_sequence(sequence=args.sequence)
|
||||
print(f"Error-corrected sequence: {prediction}")
|
||||
|
||||
13
trained_model/keras_metadata.pb
Normal file
13
trained_model/keras_metadata.pb
Normal file
File diff suppressed because one or more lines are too long
Binary file not shown.
Binary file not shown.
Binary file not shown.
Reference in New Issue
Block a user