Skip to content

API for file processing. It's a collection of methods for handling various file types.

License

Notifications You must be signed in to change notification settings

kaywuensche/file_processor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

File Processor

API for file processing. It's a collection of methods for handling various file types:

  • Pdf
  • Excel
  • Word
  • Powerpoint
  • Video
  • Image
  • Text

Goals:

  • Extract text from files for further usage in natural language processing, like natural entity recognition or text classification
  • Extract images from files for further usage in computer vision, like optical character recognition, image classification or object detection
  • General methods for image preparation, like resizing or augmentation
  • General methods for text preparation, like language classification or chunking

Prerequisites:

  • docker
  • docker-compose

Check for prerequisites

To check if docker-ce is installed:

docker --version

To check if docker-compose is installed:

docker-compose --version

Install prerequisites

Ubuntu

To install Docker and Docker Compose on Ubuntu, please follow the link.

Windows 10

To install Docker on Windows, please follow the link.

P.S: For Windows users, open the Docker Desktop menu by clicking the Docker Icon in the Notifications area. Select Settings, and then Advanced tab to adjust the resources available to Docker Engine.

Build The Docker Image

In order to build the project run the following command from the project's root directory:

sudo docker-compose up --build --remove-orphans

API Endpoints

To see all the available endpoints, open your favorite browser and navigate to:

http://<machine_IP>:5011/docs

overview

Some of the endpoint will response with a pdf or a zip file. Fast API can't display these files, but you can use a tool like Postman with the additional functionality of saving the response for test purposes:

Bildschirmfoto 2021-05-20 um 16 57 26