Deep floyd

Core Features & Characteristics: DeepFloyd IF is a modular composed of a frozen text encoder and three cascaded pixel diffusion modules: a base model that generates 64x64 px images based on text prompt, and two super-resolution models designed to generate images of increasing resolution: 256x256 px and 1024x1024 px. All stages utilize a frozen text encoder based on the T5 transformer to extract text embeddings, which are then fed into a UNet architecture enhanced with cross-attention and attention pooling. The model outperforms current state-of-the-art models, achieving a zero-shot FID score of 6.66 on the COCO dataset, underscoring the potential of larger UNet architectures in the first stage of cascaded diffusion models.

Main Modes & Use Cases:

  1. Dream (Text-to-Image): Generates images based on text prompts with customizable parameters like guidance_scale and sample_timestep_respacing.
  2. Zero-shot Image-to-Image Translation (Style Transfer): The output of the prompt comes out in the style of the support_pil_img, supporting styles like professional origami, oil art, plastic building bricks, and classic anime.
  3. Super Resolution: Users can run IF-II and IF-III or 'Stable x4' on an image not necessarily generated by IF to upscale low-resolution images to high-resolution.
  4. Zero-shot Inpainting: Performs local image repainting based on the provided original image, inpainting mask, and text prompt.

Usage Instructions & Integration:

  • Integrated with the Hugging Face Diffusers library, utilizing model cpu offloading to run the whole IF pipeline with as little as 14 GB of VRAM. If using torch>=2.0.0, all enable_xformers_memory_efficient_attention() functions must be deleted.
  • Users must have a Hugging Face account, accept the license on the model card, and login locally using huggingface_hub with an access token.
  • Local installation via deepfloyd_if Python library is also available, requiring xformers and CLIP.

Hardware Requirements:

  • Minimum 16GB VRAM for IF-I-XL (4.3B) & IF-II-L (1.2B); 24GB VRAM required to also include Stable x4 upscaler (to 1024x1024). Requires xformers and setting env variable FORCE_MEM_EFFICIENT_ATTN=1.

Model Zoo & Scale: Includes models of various parameter sizes such as IF-I-M (400M), IF-I-L (900M), IF-I-XL (4.3B), IF-II-M (450M), IF-II-L (1.2B), and IF-III-L (700M).

Pricing & License: The code is released under a bespoke license (with specific restricted points). The initial release of the IF model is under a restricted research-purposes-only license temporarily, with the intention to release a fully open-source model later. Model weights are accessible for free via Hugging Face.

액세스: 416.2M
나라: United States
가격 모델: Free
이미지 생성기 자원 Free Open Source

의론

로그인 After Sign In, you can make comments