Limitations - Image Studio

What Arli AI Can Do

Text-to-Image Generation

Generate images from text descriptions using 74+ Stable Diffusion models including FLUX, SDXL, SD 1.5, anime, and artistic variants.

Image-to-Image Transformation

Transform existing images based on text prompts. Control the intensity with denoising strength.

Image Upscaling

Upscale images by 2x-4x using ESRGAN models with enhanced detail.

Text Generation (117 models)

OpenAI-compatible text/chat completions with 117 models. Includes guided generation (JSON schema, regex, choice constraints).

Vision Analysis (base64)

Analyze images using vision-capable text models (Qwen3.5-VL, Qwen3.5-27B). Base64 input only — URLs not supported.

What Arli AI Cannot Do

Video Generation

Arli AI cannot generate or edit video content. Consider Runway, Kling AI, or AnimateDiff for video generation.

Audio/Speech/TTS

No text-to-speech, speech recognition, or audio generation capabilities.

Vision via URL

Arli AI text endpoints with vision only accept base64-encoded images, not URLs. Use Featherless for URL-based vision input.

Streaming Responses

Arli AI image endpoints return full base64 responses. Text endpoints may support streaming depending on the model.

Fine-tuning / Training

Cannot fine-tune, train, or customize models. All 74 image models and 117 text models are used as-is.

Inpainting with Masks

While the img2img endpoint accepts a mask parameter, inpainting results may vary. Dedicated inpainting models may give better results.

Size Limits

Parameter	Limit	Notes
Image width	64 – 2048 px	Must be a multiple of 64
Image height	64 – 2048 px	Must be a multiple of 64
Batch size	1 – 4	Images per request
Steps	1 – 150	Denoising steps
CFG scale	1 – 20	Classifier-free guidance
Prompt length	No hard limit	Very long prompts may be truncated by the model
Input image (img2img)	Base64 encoded	Large images may timeout
Input image (upscale)	Base64 encoded	Large images may timeout
Max tokens (text)	Model-dependent	Varies by model context window

Rate Limits

Type	Limit	Details
Parallel requests (image)	6 concurrent	Exceeding returns HTTP 429. Check `GET /v1/parallel-requests`
Request timeout (image)	300 seconds	5-minute timeout for generation requests
Request timeout (list)	30 seconds	For model listing and metadata endpoints
Text rate limits	Per-key	Varies by API key plan tier

Checking Rate Limits

curl https://api.arliai.com/v1/parallel-requests \
  -H "Authorization: Bearer YOUR_KEY"

# Response:
# {"parallel_requests": 6, "remaining": 4, "message": "..."}

Timeout Handling

Image generation can take 30-120 seconds depending on model, resolution, and steps.

Set your HTTP client timeout to at least 300 seconds
For faster results, reduce steps (15-20) or use smaller resolutions
FLUX.1 Schnell and LUMINA models generate faster
If you get timeouts consistently, try a smaller image size or fewer steps

Quality Limitations

Small text in images: AI struggles with rendering readable text in generated images
Exact face reproduction: Cannot reproduce specific real people accurately
Prompt adherence: Complex prompts with many elements may not all be represented
Composition: Large aspect ratios may have composition issues
Hands/fingers: Common AI artifact — may generate incorrect numbers of fingers

Best Practices

Use the API gateway at api.kim8.s4s.host for no-auth access
For vision tasks requiring URL input, use Vision AI or Featherless
For advanced text generation with 5700+ models, use Featherless AI
Always handle HTTP 429 with retry logic
Use GET /sdapi/v1/sd-models to discover available image models
Use GET /v1/img-samplers and GET /v1/upscalers for available options

Limitations & Capabilities