Features | Dependencies | SystemRequirements | Install | Usage | Models | Wiki | Acknowledgment | Licenses
- Work in progress! (ALPHA)
- English | Русский
A simple and convenient interface for using various neural network models. You can communicate with LLM and Moondream2 using text, voice and image input, use StableDiffusion to generate images, ZeroScope 2 to generate videos, TripoSR and Shap-E to generate 3D objects, AudioCraft and AudioLDM 2 to generate music and audio, CoquiTTS and SunoBark for text-to-speech, OpenAI-Whisper for speech-to-text, Wav2Lip for lip-sync, Roop to faceswap, Rembg to remove background, CodeFormer for face restore, LibreTranslate for text translation and Demucs for audio file separation. You can also download the LLM and StableDiffusion models, change the application settings inside the interface and check system sensors
The goal of the project - to create the easiest possible application to use neural network models
- Easy installation via install.bat(Windows) or install.sh(Linux)
- You can use the application via your mobile device in localhost(Via IPv4) or anywhere online(Via Share)
- Flexible and optimized interface (By Gradio)
- Authentication via admin:admin (You can enter your login details in the GradioAuth.txt file)
- Support for Transformers and llama.cpp models (LLM)
- Support for diffusers and safetensors models (StableDiffusion) - txt2img, img2img, depth2img, pix2pix, controlnet, upscale, inpaint, gligen, animatediff, video, cascade and extras tabs
- AudioCraft support (Models: musicgen, audiogen and magnet)
- AudioLDM 2 support (Models: audio and music)
- Supports TTS and Whisper models (For LLM and TTS-STT)
- Supports Lora, Textual inversion (embedding), Vae, Img2img, Depth, Pix2Pix, Controlnet, Upscale, Inpaint, GLIGEN, AnimateDiff, Videos, Cascade, Rembg, CodeFormer and Roop models (For StableDiffusion)
- Support Multiband Diffusion model (For AudioCraft)
- Support LibreTranslate (Local API)
- Support ZeroScope 2
- Support SunoBark
- Support Demucs
- Support Shap-E
- Support TripoSR
- Support Wav2Lip
- Support Multimodal (Moondream 2), LORA (transformers) and WebSearch (with GoogleSearch) for LLM
- Model settings inside the interface
- ModelDownloader (For LLM and StableDiffusion)
- Application settings
- Ability to see system sensors
- C+ compiler
- Windows: VisualStudio
- Linux: GCC
- System: Windows or Linux
- GPU: 6GB+ or CPU: 8 core 3.2GHZ
- RAM: 16GB+
- Disk space: 20GB+
- Internet for downloading models and installing
Git clone https://github.com/Dartvauder/NeuroSandboxWebUI.git
to any location- Run the
Install.bat
and wait for installation - After installation, run
Start.bat
- Select the file version and wait for the application to launch
- Now you can start generating!
To get update, run Update.bat
To work with the virtual environment through the terminal, run Venv.bat
Git clone https://github.com/Dartvauder/NeuroSandboxWebUI.git
to any location- In the terminal, run the
./Install.sh
and wait for installation of all dependencies - After installation, run
./Start.sh
- Wait for the application to launch
- Now you can start generating!
To get update, run ./Update.sh
To work with the virtual environment through the terminal, run ./Venv.sh
Interface has fifteen tabs: LLM, TTS-STT, SunoBark, LibreTranslate, Wav2Lip, StableDiffusion, ZeroScope 2, TripoSR, Shap-E, AudioCraft, AudioLDM 2, Demucs, ModelDownloader, Settings and System. Select the one you need and follow the instructions below
- First upload your models to the folder: inputs/text/llm_models
- Select your model from the drop-down list
- Select model type (
transformers
orllama
) - Set up the model according to the parameters you need
- Type (or speak) your request
- Click the
Submit
button to receive the generated text and audio response
Optional: you can enable TTS
mode, select the voice
and language
needed to receive an audio response. You can enable multimodal
and upload an image to get its description. You can enable websearch
for Internet access. You can enable libretranslate
to get the translate. Also you can choose LORA
model to improve generation
- Type text for text to speech
- Input audio for speech to text
- Click the
Submit
button to receive the generated text and audio response
- Type your request
- Set up the model according to the parameters you need
- Click the
Submit
button to receive the generated audio response
- First you need to install and run LibreTranslate
- Select source and target languages
- Click the
Submit
button to get the translate
- Upload the initial image of face
- Upload the initial audio of voice
- Set up the model according to the parameters you need
- Click the
Submit
button to receive the lip-sync
- First upload your models to the folder: inputs/image/sd_models
- Select your model from the drop-down list
- Select model type (
SD
,SD2
orSDXL
) - Set up the model according to the parameters you need
- Enter your request (+ and - for prompt weighting)
- Click the
Submit
button to get the generated image
Optional: You can select your vae
, embedding
and lora
models to improve the generation method, also you can enable upscale
to increase the size of the generated image
- First upload your models to the folder: inputs/image/sd_models
- Select your model from the drop-down list
- Select model type (
SD
,SD2
orSDXL
) - Set up the model according to the parameters you need
- Upload the initial image with which the generation will take place
- Enter your request (+ and - for prompt weighting)
- Click the
Submit
button to get the generated image
- Upload the initial image
- Set up the model according to the parameters you need
- Enter your request (+ and - for prompt weighting)
- Click the
Submit
button to get the generated image
- Upload the initial image
- Set up the model according to the parameters you need
- Enter your request (+ and - for prompt weighting)
- Click the
Submit
button to get the generated image
- First upload your stable diffusion models to the folder: inputs/image/sd_models
- Upload the initial image
- Select your stable diffusion and controlnet models from the drop-down lists
- Set up the models according to the parameters you need
- Enter your request (+ and - for prompt weighting)
- Click the
Submit
button to get the generated image
- Upload the initial image
- Set up the model according to the parameters you need
- Click the
Submit
button to get the upscaled image
- First upload your models to the folder: inputs/image/sd_models/inpaint
- Select your model from the drop-down list
- Select model type (
SD
,SD2
orSDXL
) - Set up the model according to the parameters you need
- Upload the image with which the generation will take place to
initial image
andmask image
- In
mask image
, select the brush, then the palette and change the color to#FFFFFF
- Draw a place for generation and enter your request (+ and - for prompt weighting)
- Click the
Submit
button to get the inpainted image
- First upload your models to the folder: inputs/image/sd_models
- Select your model from the drop-down list
- Select model type (
SD
,SD2
orSDXL
) - Set up the model according to the parameters you need
- Enter your request for prompt (+ and - for prompt weighting) and GLIGEN phrases (in "" for box)
- Enter GLIGEN boxes (Like a [0.1387, 0.2051, 0.4277, 0.7090] for box)
- Click the
Submit
button to get the generated image
- First upload your models to the folder: inputs/image/sd_models
- Select your model from the drop-down list
- Set up the model according to the parameters you need
- Enter your request (+ and - for prompt weighting)
- Click the
Submit
button to get the generated image animation
- Upload the initial image
- Enter your request (for IV2Gen-XL)
- Set up the model according to the parameters you need
- Click the
Submit
button to get the video from image
- Enter your request
- Set up the model according to the parameters you need
- Click the
Submit
button to get the generated image
- Upload the initial image
- Select the options you need
- Click the
Submit
button to get the modified image
- Enter your request
- Set up the model according to the parameters you need
- Click the
Submit
button to get the generated video
- Upload the initial image
- Set up the model according to the parameters you need
- Click the
Submit
button to get the generated 3D object
- Enter your request or upload the initial image
- Set up the model according to the parameters you need
- Click the
Submit
button to get the generated 3D object
- Select a model from the drop-down list
- Select model type (
musicgen
oraudiogen
) - Set up the model according to the parameters you need
- Enter your request
- (Optional) upload the initial audio if you are using
melody
model - Click the
Submit
button to get the generated audio
- Select a model from the drop-down list
- Set up the model according to the parameters you need
- Enter your request
- Click the
Submit
button to get the generated audio
- Upload the initial audio to separate
- Click the
Submit
button to get the separated audio
- Here you can download
LLM
andStableDiffusion
models. Just choose the model from the drop-down list and click theSubmit
button
- Here you can change the application settings. For now you can only change
Share
mode toTrue
orFalse
- Here you can see the indicators of your computer's sensors by clicking on the
Submit
button
- All generations are saved in the outputs folder
- You can press the
Clear
button to reset your selection - To stop the generation process, click the
Stop generation
button - You can turn off the application using the
Close terminal
button - You can open the outputs folder by clicking on the
Folder
button
- LLM models can be taken from HuggingFace or from ModelDownloader inside interface
- StableDiffusion, vae, inpaint, embedding and lora models can be taken from CivitAI or from ModelDownloader inside interface
- AudioCraft, AudioLDM 2, TTS, Whisper, Wav2Lip, SunoBark, MoonDream2, Upscale, GLIGEN, Depth, Pix2Pix, Controlnet, AnimateDiff, Videos, Cascade, Rembg, Roop, CodeFormer, TripoSR, Shap-E, Demucs, ZeroScope and Multiband diffusion models are downloads automatically in inputs folder when are they used
- You can take voices anywhere. Record yours or take a recording from the Internet. Or just use those that are already in the project. The main thing is that it is pre-processed!
Many thanks to these projects because thanks to their applications/libraries, i was able to create my application:
First of all, I want to thank the developers of PyCharm and GitHub. With the help of their applications, i was able to create and share my code
gradio
- https://github.com/gradio-app/gradiotransformers
- https://github.com/huggingface/transformerstts
- https://github.com/coqui-ai/TTSopenai-whisper
- https://github.com/openai/whispertorch
- https://github.com/pytorch/pytorchsoundfile
- https://github.com/bastibe/python-soundfilecuda-python
- https://github.com/NVIDIA/cuda-pythongitpython
- https://github.com/gitpython-developers/GitPythondiffusers
- https://github.com/huggingface/diffusersllama.cpp-python
- https://github.com/abetlen/llama-cpp-pythonaudiocraft
- https://github.com/facebookresearch/audiocraftAudioLDM2
- https://github.com/haoheliu/AudioLDM2xformers
- https://github.com/facebookresearch/xformersdemucs
- https://github.com/facebookresearch/demucslibretranslate
- https://github.com/LibreTranslate/LibreTranslatelibretranslatepy
- https://github.com/argosopentech/LibreTranslate-pyrembg
- https://github.com/danielgatis/rembgtrimesh
- https://github.com/mikedh/trimeshgooglesearch-python
- https://github.com/Nv7-GitHub/googlesearchtorchmcubes
- https://github.com/tatsy/torchmcubessuno-bark
- https://github.com/suno-ai/bark
Many models have their own license for use. Before using it, I advise you to familiarize yourself with them:
- Transformers
- llama.cpp
- CoquiTTS
- OpenAI-Whisper
- LibreTranslate
- Diffusers
- StableDiffusion1.5
- StableDiffusion2
- StableDiffusionXL
- StableCascade
- StableVideoDiffusion
- I2VGen-XL
- Rembg
- Shap-E
- AudioCraft
- AudioLDM2
- Demucs
- SunoBark
- Moondream2
- ZeroScope2
- TripoSR
- GLIGEN
- Wav2Lip
- Roop
- CodeFormer
- ControlNet
- AnimateDiff
- Pix2Pix