


























We are proud to announce the latest generation of enterprise readiness features for Snorkel Flow introduced in our 2024.R3 release. These capabilities enable enterprise IT admins, compliance officers, and security analysts to configure and tailor Snorkel Flow data safeguards to your company’s control policies.
As a result, many top US financial institutions and Fortune 500 enterprises continue to entrust Snorkel for programmatic data development, model fine-tuning, and SME annotation on their corporate and customer data.
Learn more about:
Snorkel Flow’s role-based access controls now enable you to limit who has access to specific features and data on the platform. Enterprise IT admins can configure access to features and data at an instance, workspace, or role level by leveraging access control rules.
In addition to empowering admins to manually provision users and configure access on the platform, Snorkel Flow can sync with external identity providers like Azure Active Directory to directly consume entitlement information within SAML or OIDC SSO integrations. Snorkel automatically provisions those users with locked-down feature & data access to a set of permissioned workspaces.
In compliance with NIST AAL3 re-authentication standards, Snorkel Flow requires users to reauthenticate if they idle for more than 15 minutes. Users must also re-authenticate every 12 hours for any given active session. IT admins can configure the login rate depending on your security posture.

Snorkel enables multiple paths to bring data into and out of Snorkel Flow, including but not limited to:
Snorkel Flow admins can configure who has access to which data connectors and customize them at the role and workspace level.

For authenticated access, Snorkel provides the ability to securely persist, manage, and control access to data connector credentials on the platform. Within a given workspace, you can enable trusted users to create, modify, and delete credentials—or limit them to only use existing credentials without the ability to view the underlying secrets.

We understand that customers may wish to manage their credentials and keys off-platform. In future releases, Snorkel will provide integrations with cloud-native services like AWS Secrets Manager or equivalents on GCP and Azure.
Prior to adding data to an application in Snorkel Flow, users can perform data surgery, error correction, and dataset triage. Users will find data stored within workspace scoped buckets persisted on the Snorkel NFS.
To that end, we have changed our data access policies to enforce data isolation at the workspace level by default. In this release, we are announcing the deprecation of MinIO Client and boto3 upload and download functions from Snorkel Flow’s SDK to make our users’ data more secure than ever.
Snorkel’s SDK now provides suitable replacement functions in our new 1st party storage-api. This new service uploads files in an RBAC-safe manner directly into workspace-scoped buckets. To give customers adequate time to migrate onto storage-api and maintain backward compatibility with legacy notebooks, we will continue to permit MinIO Client and boto3 usage until EOQ1 2025—after which access to these functions will be blocked.

Snorkel encrypts on-platform data both at-rest and in-transit. The platform secures all services, credentials, and keys by first salting and then using AES-256 encryption with a random-initialization vector.
If you choose to use our Snorkel Hosted deployment option, we ensure that all customer data is stored exclusively on our SOC-2 Type II certified infrastructure and is isolated, stored, and processed separately from all other customer data.
Additionally, we enforce audit trail event coverage and capture relevant network telemetry over all critical on-platform UI and notebook SDK actions. Upon request, audit events can be exported off-platform for long-term, secured cold-storage retention on all major clouds to enable Basel II and SOX compliance for financial institutions, or HIPAA compliance for health industry covered entities.
In the spirit of transparency, Snorkel is excited to publish some high-level statistics, trends, and observations about our customers’ preferred production installation methods. We hope to continue publishing similar metrics for future releases to track changes in enterprise preferences over time.


Snorkel is excited to announce our new Snorkel-managed, in-customer VPC deployment method for AWS and Azure. This installation path is best suited for customers who still want data to be hosted within their private cluster, but do not have the bandwidth to install, manage, and upgrade the Snorkel Flow platform.
Most of our customers prefer managed VPC when they transition from on-prem to private cloud environments. Frameworks like cloud-native Kubernetes & terraform, initially setting up network configurations, and IAM management can present a significant learning curve to onboarding enterprises. This can challenge newcomers. Standard installations may require over 20+ configuration inputs— with many more optional configurations available for your particular infrastructure stack.
With Managed VPC, Snorkel infrastructure engineers and customer support provide considerably faster time to resolution with direct instance access to the infrastructure environment, but with limited permissions that can be configured at your discretion.
As a result, we have seen average install and upgrade times drop from days for pure on-prem installations to under 2 hours for a managed VPC setup.
Although unavailable in 2024.R3, managed VPC support for GCP will be provided in subsequent releases. Please reach out to your Snorkel representative for more details.
We are announcing deprecation and end-of-life support for Snorkel Flow on-prem and private cloud instances that are installed on single node VMs. Customers currently installed on a single node VM setup should reach out to their Snorkel representative for next steps on how to migrate onto a different provisioning & installation method.
This was a difficult decision. We know that some customers may want to provision a small instance locally to test a Snorkel Flow deployment during POVs. Unfortunately, doing so creates a myriad of unforced errors and support issues, including but not limited to:
As a company, we fundamentally believe Snorkel Flow’s mission is to enable customers to perform programmatic data development at production scale. Therefore, we now require that all new customers host Snorkel Flow in a multi-node available, horizontally scalable environment.
Customers want to develop with larger, more modern models and LLMs while prototyping, which means they will require more GPU cores and vRAM. We are excited to provide A10G GPUs to all Snorkel Hosted customers by default. This marks a significant upgrade over NVIDIA T4 GPUs, which are better suited to small-scale model training and basic image processing tasks rather than LLM inference on 7B parameter models.

Comparison courtesy of Baseten, note the table shows NVIDIA A10s but they are identical in terms of vRAM and core count (differs in following specs)
We are excited to announce that all Snorkel Flow features (including but not limited to Prompt LFs, Warm Start LFs, and FM Suite) which rely on third party LLM inference services like OpenAI and AzureOpenAI endpoints have become significantly faster.
We have observed a 10-20x reduction in aggregate network bound job latency between our R2 v0.93 LTS and R3 v0.95 LTS versions, and we anticipate that network bound jobs will become 50x faster by R4 v0.96 STS.
Snorkel achieved this by implementing a new robust concurrency framework which fully saturates resource quotas while preventing any given job from triggering LLM rate limits. It also implements fair scheduling for concurrent jobs, as well as caching and checkpointing for long-running jobs.
We designed this new framework to generalize and support multiple types of remote inference services (with varying rate limit and exponential back-off policies), and have made it easy to flexibly customize LLM provider constraints in SDK.

We wanted to recognize our partners from Prefect for their support in helping us construct a best in class in-process job orchestration framework. Thank you for your support, and we hope to continue contributing back to the Prefect Open Source over the next couple of releases!
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。