By turning complex YAML configurations into an intuitive visual flow, Studio speeds up synthetic data creation and reduces errors, accelerating AI model development cycles.
Synthetic data has become a cornerstone for training large language models, yet crafting reliable pipelines often requires juggling YAML files, terminal commands, and disparate tooling. SyGra Studio addresses this friction by offering a single‑pane canvas where data engineers can assemble end‑to‑end workflows visually. The platform abstracts the underlying graph configuration, letting users focus on model selection, prompt design, and data source integration without manual scripting, thereby lowering the barrier to entry for teams new to synthetic data generation.
Beyond the visual editor, Studio packs enterprise‑grade features that appeal to seasoned AI practitioners. It natively supports a range of LLM back‑ends—including OpenAI, Azure OpenAI, Ollama, Vertex, Bedrock, and custom endpoints—while allowing connections to Hugging Face, file systems, or ServiceNow repositories. Prompt fields surface available state variables on the fly, and the built‑in Monaco editor provides syntax‑highlighted code with breakpoints and live logs. During execution, the interface streams token usage, latency, and cost metrics, giving immediate insight into budget impact and performance bottlenecks.
For organizations, the shift to a visual workflow translates into faster iteration cycles and reduced operational risk. Existing YAML‑based SyGra tasks can be imported unchanged, preserving legacy investments while gaining observability and debugging capabilities. The seamless export of generated datasets supports downstream training pipelines, annotation tools, and evaluation suites. As synthetic data demand grows, tools like SyGra Studio are poised to become standard components in AI development stacks, driving productivity and ensuring more transparent, cost‑controlled model training.
SyGra 2.0.0 introduces Studio, an interactive environment that turns synthetic data generation into a transparent, visual craft. Instead of juggling YAML files and terminals, you compose flows directly on the canvas, preview datasets before committing, tune prompts with inline variable hints, and watch executions stream live—all from a single pane. Under the hood it’s the same platform, so everything you do visually generates the corresponding SyGra‑compatible graph config and task executor scripts.
Configure and validate models with guided forms (OpenAI, Azure OpenAI, Ollama, Vertex, Bedrock, vLLM, custom endpoints).
Connect Hugging Face, file‑system, or ServiceNow data sources and preview rows before execution.
Configure nodes by selecting models, writing prompts (with auto‑suggested variables), and defining outputs or structured schemas.
Design downstream outputs using shared state variables and Pydantic‑powered mappings.
Execute flows end‑to‑end and review generated results instantly with node‑level progress.
Debug with inline logs, breakpoints, Monaco‑backed code editors, and auto‑saved drafts.
Monitor per‑run token cost, latency, and guardrail outcomes with execution history stored in .executions/.
Let’s walk through this experience step by step.
Open Studio, click Create Flow, and Start/End nodes appear automatically. Before adding anything else:
Choose a connector (Hugging Face, disk, or ServiceNow).
Enter parameters like repo_id, split, or file path, then click Preview to fetch sample rows.
Column names immediately become state variables (e.g., {prompt}, {genre}), so you know exactly what can be referenced inside prompts and processors.
Once validated, Studio keeps the configuration in sync and pipes those variables throughout the flow—no manual wiring or guesswork.
Drag the blocks you need from the palette. For a story‑generation pipeline:
Drop an LLM node named “Story Generator,” select a configured model (e.g., gpt-4o-mini), write the prompt, and store the result in story_body.
Add a second LLM node named “Story Summarizer,” reference {story_body} inside the prompt, and output to story_summary.
Toggle structured outputs, attach tools, or add Lambda/Subgraph nodes if you need reusable logic or branching behavior.
Studio’s detail panel keeps everything in context—model parameters, prompt editor, tool configuration, pre/post‑process code, and even multi‑LLM settings if you want parallel generations. Typing { inside a prompt surfaces every available state variable instantly.
Open the Code Panel to inspect the exact YAML/JSON Studio is generating. This is the same artifact written to tasks/examples/, so what you see is what gets committed.
When you’re ready to execute:
Click Run Workflow.
Choose record counts, batch sizes, retry behavior, etc.
Hit Run and watch the Execution panel stream node status, token usage, latency, and cost in real time. Detailed logs provide observability and make debugging effortless. All executions are written to .executions/runs/*.json.
After the run, download outputs, compare against prior executions, and get metadata of latency and usage details.
SyGra Studio can also execute existing workflows in the tasks directory. For example, the tasks/examples/glaive_code_assistant/ workflow ingests the glaiveai/glaive-code-assistant-v2 dataset, drafts answers, critiques them, and loops until the critique returns “NO MORE FEEDBACK.”
Inside Studio you’ll notice:
Canvas layout – two LLM nodes (generate_answer and critique_answer) linked by a conditional edge that either routes back for more revisions or exits to END when the critique is satisfied.
Tunable inputs – the Run modal lets you switch dataset splits, adjust batch sizes, cap records, or tweak temperatures without touching YAML.
Observable execution – watch both nodes light up in sequence, inspect intermediate critiques, and monitor status in real time.
Generated outputs – synthetic data is generated, ready for model training, evaluation pipelines, or annotation tools.
git clone https://github.com/ServiceNow/SyGra.git
cd SyGra && make studio
Docs: https://servicenow.github.io/SyGra/
Studio Docs: https://servicenow.github.io/SyGra/getting_started/create_task_ui/
Example config: tasks/examples/glaive_code_assistant/graph_config.yaml
SyGra Studio turns synthetic data workflows into a visual, user‑friendly experience. Configure once, build with confidence, run with full observability, and generate the data without ever leaving the canvas.
Comments
Want to join the conversation?
Loading comments...