
The blog post demonstrates how Haystack can power a multi‑agent system that automatically detects incidents, investigates metrics and logs, and generates production‑grade postmortems. It walks through a reproducible notebook that creates synthetic observability data, applies a rolling z‑score detector, and orchestrates specialist agents for profiling, mitigation planning, and documentation. The coordinator agent strings together tools for data loading, SQL analysis, log pattern scanning, and hypothesis generation, delivering a complete incident review without relying on external retrieval‑augmented generation. Full code and prompts are provided for end‑to‑end execution.

StepFun AI unveiled Step-DeepResearch, a 32‑billion‑parameter agent built on Qwen2.5‑32B‑Base that transforms web search into end‑to‑end research workflows. The model internalizes four atomic capabilities—planning, deep information seeking, reflection/verification, and report generation—using specialized data pipelines and long‑context training up to 128k...

MarkTechPost introduces a cost‑aware planning AI agent that explicitly balances token usage, latency, and tool‑call budgets when generating action plans. The agent creates multiple candidate steps, estimates their resource spend, and employs a beam‑style search with redundancy penalties to select...

Microsoft unveiled VibeVoice‑ASR, an open‑source speech‑to‑text model that processes up to 60 minutes of continuous audio in a single pass using a 64K‑token context. The model jointly performs automatic speech recognition, speaker diarization, and timestamping, delivering structured transcripts that capture who...

Inworld AI unveiled TTS‑1.5, a production‑grade text‑to‑speech engine built for real‑time voice agents. The Max variant delivers sub‑250 ms P90 time‑to‑first‑audio latency, while the Mini version hits sub‑130 ms, roughly four times faster than the previous generation. The models claim 30% more...

The tutorial demonstrates building a production‑grade tabular machine‑learning pipeline with AutoGluon, covering data ingestion, automated model search, stacked and bagged ensembles, and deployment‑ready artifacts. Using the Titanic dataset, the workflow applies dynamic presets, trains ensembles within a 7‑minute budget, evaluates...

Liquid AI unveiled LFM2.5-1.2B‑Thinking, a 1.17 billion‑parameter reasoning model that occupies roughly 900 MB and runs fully on‑device. Designed for structured reasoning, the model emits internal thinking traces, enabling tool use, math, and multi‑step planning without cloud reliance. Benchmarks show it outperforms...

The post walks readers through building a semi‑centralized Anemoi‑style multi‑agent system using LangGraph, where a Drafter and a Critic negotiate drafts without a supervising manager. It provides a complete Colab notebook, installs LangGraph and LangChain, defines a typed shared state,...

Nous Research unveiled NousCoder-14B, a competitive programming model built on Qwen3-14B and fine‑tuned with execution‑based reinforcement learning. On the LiveCodeBench v6 benchmark, the model achieved a Pass@1 score of 67.87%, outpacing the Qwen3-14B baseline by 7.08 points. Training leveraged 24,000...

Vercel has launched the open‑source agent‑skills package, a plug‑in style manager that turns curated React, Next.js, and web‑design best‑practice playbooks into reusable capabilities for AI coding agents. The initial release bundles three core skills—react‑best‑practices, web‑design‑guidelines, and vercel‑deploy‑claimable—each containing dozens of rule‑based...

NVIDIA unveiled PersonaPlex-7B-v1, a 7‑billion‑parameter full‑duplex speech‑to‑speech model that merges automatic speech recognition, language understanding, and text‑to‑speech into a single transformer. The dual‑stream architecture processes user audio and agent output concurrently, enabling barge‑in, overlapping speech, and rapid turn‑taking. Hybrid voice...

The tutorial demonstrates how to construct a self‑evaluating, agentic AI system using LlamaIndex and OpenAI’s gpt‑4o‑mini model. It combines retrieval‑augmented generation, tool integration, and automated faithfulness and relevancy scoring to create a reliable RAG workflow. The ReActAgent orchestrates evidence retrieval,...
![Black Forest Labs Releases FLUX.2 [Klein]: Compact Flow Models for Interactive Visual Intelligence](/cdn-cgi/image/width=1200,quality=75,format=auto,fit=cover/https://www.marktechpost.com/wp-content/uploads/2026/01/blog-banner23-30-1024x731.png)
Black Forest Labs unveiled FLUX.2 [klein], a compact family of rectified flow transformers with 4 billion and 9 billion parameters designed for interactive visual intelligence on consumer GPUs. The distilled variants run in sub‑second latency using only four inference steps, while base models...

The tutorial demonstrates how to construct a Minimal Communication Protocol (MCP) that is stateless, cryptographically signed, and capable of handling asynchronous, long‑running tasks. Using Python, Pydantic models enforce strict schema validation for every request and response, while HMAC signatures guarantee...

AI observability extends classic logging, metrics, and tracing into the probabilistic world of large language models. By breaking an LLM‑driven workflow into traces and nested spans, teams can monitor each step—from input handling to final decision—just like traditional production software....

The tutorial demonstrates building a multi‑turn crescendo‑style red‑team pipeline with Garak to stress‑test large language model safety. It adds a lightweight custom detector for system‑prompt leakage and an iterative probe that escalates benign prompts toward sensitive extraction across several turns....

Researchers from Alibaba and Wuhan University present AgeMem, a unified agentic memory framework that lets LLM agents learn to manage both long‑term and short‑term memory through the same policy. Memory operations—add, update, delete, retrieve, summarize, filter—are exposed as tools within...

The MarkTechPost tutorial walks readers through a targeted data‑poisoning experiment that flips a portion of CIFAR‑10 labels from a chosen class to a malicious class using PyTorch. By constructing parallel clean and poisoned training pipelines with a lightweight ResNet‑18, the...

Researchers from CAMEL AI, Eigent AI and partners released SETA, an open‑source stack that couples a terminal‑focused toolkit with 400 synthetic reinforcement‑learning tasks. The framework delivers state‑of‑the‑art results on the Terminal Bench benchmark, hitting 46.5% accuracy on version 2.0 with a...

The tutorial shows how Ibis can create a portable, in‑database feature‑engineering pipeline that feels like Pandas but runs entirely in DuckDB. By registering data in the backend and keeping all transformations lazy, the code is translated into efficient SQL without...

Stanford Medicine researchers unveiled SleepFM Clinical, a multimodal foundation model trained on 585,000 hours of polysomnography from about 65,000 individuals. The model learns a unified representation of brain, heart, and respiratory signals and can predict long‑term risk for more than...

The tutorial shows how to build a unified Apache Beam pipeline that can run in both batch and stream‑like modes using the DirectRunner. It creates synthetic event‑time data, applies fixed windows with triggers and allowed lateness, and demonstrates how Beam...

Liquid AI unveiled LFM2.5, a compact 1.2 billion‑parameter model family designed for on‑device and edge inference. The suite includes Base, Instruct, Japanese, vision‑language, and audio variants, all released with open weights on Hugging Face and via the LEAP platform. Pre‑training was...

Zlab Princeton has open‑sourced the LLM‑Pruning Collection, a JAX‑based repository that aggregates leading pruning techniques for large language models. The repo bundles block‑level, layer‑level, and weight‑level methods—including Minitron, ShortGPT, Wanda, SparseGPT, Magnitude, Sheared LLaMA and LLM‑Pruner—under a unified training and evaluation...

Tencent Hunyuan researchers unveiled HY-MT1.5, a bilingual translation family comprising a 1.8 B and a 7 B model. Both models cover 33 languages plus five dialect variants and are released with open weights on GitHub and Hugging Face. The compact 1.8 B variant runs...

Prompt caching reduces LLM API costs by reusing static prompt components. By storing key‑value attention states in GPU memory, identical prefixes avoid recomputation, cutting latency and token usage. Engineers can boost efficiency by analyzing request patterns, restructuring prompts so shared...

The tutorial builds a red‑team evaluation harness with Strands Agents to stress‑test a tool‑using AI assistant against prompt‑injection and tool‑misuse attacks. It defines a guarded target agent, a red‑team agent that auto‑generates adversarial prompts, and a judge agent that scores...

Tencent Hunyuan’s 3D Digital Human team launched HY‑Motion 1.0, an open‑weight text‑to‑3D human motion model built on a Diffusion Transformer (DiT) architecture and trained with Flow Matching. The flagship model contains 1 billion parameters, with a Lite 0.46 billion variant, and generates SMPL‑H...

The tutorial walks through building a privacy‑preserving federated fraud‑detection system from scratch using lightweight, CPU‑only PyTorch. It simulates ten independent banks, partitions highly imbalanced transaction data with a Dirichlet distribution, and coordinates local model updates via a FedAvg loop. After...

LLMRouter, an open‑source library from UIUC, sits between applications and heterogeneous LLM pools to automatically select the most appropriate model per query. It offers over 16 routing algorithms organized into single‑round, multi‑round, personalized, and agentic families, each configurable via a...

NVIDIA’s AI team unveiled NitroGen, an open‑source vision‑action foundation model that learns to play commercial games directly from pixel inputs and gamepad actions. The model is trained on 40,000 hours of filtered gameplay video spanning over 1,000 titles, using automatic...

Liquid AI released LFM2-2.6B-Exp, an experimental checkpoint that adds a pure reinforcement‑learning (RL) stage to its 2.6 billion‑parameter LFM2 model. The RL fine‑tuning targets instruction following, knowledge retrieval, and math without altering the hybrid convolution‑attention architecture. Benchmark results show the model...

The tutorial demonstrates how to build a production‑grade, agentic workflow for customer‑support ticket triage using GraphBit. It starts by configuring the GraphBit runtime, defining typed ticket data, and registering deterministic tools for classification, routing, and response drafting. These tools are...

The tutorial by Asif Razzaq demonstrates how to build a self‑organizing Zettelkasten memory system for agentic AI using Google Gemini. It defines a MemoryNode data class, ingests text by atomizing it into discrete facts, embeds each fact, and links semantically...