(5 of 6) CHRT.com - Untrusted Code Execution for Agentic Analytics

Untrusted Code

One hairy technical problem with having an AI-powered analytics agent is that you end up running lots of untrusted code
To do analytics, we feed an LLM some prompts and ask it to write code which we then run on our servers
Unfortunately, no amount of prompt scanning or static code scanning can verify that the code isn’t going to do something malicous.
- Plus, we started implementing the “code edits” feature which would allow users to directly write code and send it to our servers for execution so…yeah.

Good approach and hacky approach

I think the good approach would be to have a setup that sends
Well…something pretty close to that is the hacky approach I came up with - use Lambda functions. I think there can (and will and maybe already is?) be a great solution built on the idea of using a Micro VM like Firecracker to spin up many tiny execution environments
But with Lambda as it isnow, there’s a problem…

TODO - add diagram or code samples

user sign-up flow diagram
IAM Role code sample

The need for individual Lambda Functions

Lambda functions use the same runtime
But individual Lambda functions never share an environment
A Lambda function also has no way to change or limit its IAM permissions
- We need the lambda to reach out to an S3 bucket to get the user’s data file
- If a Lambda can’t change or limit its IAM permissions during its runtime, then it must have permission to access all user folders and files
- And we can’t trust the code being run
So we couldn’t use the same Lambda
But we could, and did, do is - spawn a new Lambda function + IAM role for each using thanks to CloudFormation

The workflow

Our auth provider is Firebase with Google Cloud Identity Platform
Like most auth providers, they have a function/hook interface for running code when a user signs up
- The Lambda function code is developed using the Lambda Python runtime Docker image
- The custom image is stored in ECR
- When a user signs up, we have our auth provider (Firebase) run a function which sends their user_id to our server
- The server then provisions an identical Lambda Function based on our existing CloudFormation template - but with an IAM policy allowing that function to access the user’s S3 directory
- Then, during CI/CD, GitHub Actions loops over the list of Lambda Function names to make each of them update to the new latest image

Concerns

Really just a pain to set up, but works fine
Cold starts weren’t so bad

Full Series: CHRT - Autopilot for Analytics (6 posts)
- Previous: (4 of 6) CHRT.com - IaC via CloudFormation
- Next Post: (6 of 6) CHRT.com - TBD

--- title: "(5 of 6) CHRT.com - Untrusted Code Execution for Agentic Analytics" author: "Aaron Carver" description: "Autopilot for Analytics" date: "Nov 9 2024" date-format: "MMMM YYYY" # image: "img/image.jpg" # image-alt: a great image categories: [Projects, Full Stack, Code, CHRT] lightbox: true format: html: code-fold: true code-tools: source: true toggle: false caption: none --- ### Untrusted Code - One hairy technical problem with having an AI-powered analytics agent is that you end up running lots of untrusted code - To do analytics, we feed an LLM some prompts and ask it to write code which we then run on our servers - Unfortunately, no amount of prompt scanning or static code scanning can verify that the code isn't going to do something malicous. - Plus, we started implementing the "code edits" feature which would allow users to directly write code and send it to our servers for execution so...yeah. ### Good approach and hacky approach - I think the good approach would be to have a setup that sends - Well...something pretty close to that is the hacky approach I came up with - use Lambda functions. I think there can (and will and maybe already is?) be a great solution built on the idea of using a Micro VM like Firecracker to spin up many tiny execution environments - But with Lambda as it isnow, there's a problem... ### TODO - add diagram or code samples - user sign-up flow diagram - IAM Role code sample ### The need for individual Lambda Functions - Lambda functions use the same runtime - But individual Lambda functions never share an environment - A Lambda function also has no way to change or limit its IAM permissions - We need the lambda to reach out to an S3 bucket to get the user's data file - If a Lambda can't change or limit its IAM permissions during its runtime, then it must have permission to access all user folders and files - And we can't trust the code being run - So we couldn't use the same Lambda - But we could, and did, do is - spawn a new Lambda function + IAM role for each using thanks to CloudFormation ### The workflow - Our auth provider is Firebase with Google Cloud Identity Platform - Like most auth providers, they have a function/hook interface for running code when a user signs up - The Lambda function code is developed using the Lambda Python runtime Docker image - The custom image is stored in ECR - When a user signs up, we have our auth provider (Firebase) run a function which sends their user_id to our server - The server then provisions an identical Lambda Function based on our existing CloudFormation template - but with an IAM policy allowing that function to access the user's S3 directory - Then, during CI/CD, GitHub Actions loops over the list of Lambda Function names to make each of them update to the new latest image ### Concerns - Really just a pain to set up, but works fine - Cold starts weren't so bad --- - Full Series: [CHRT - Autopilot for Analytics (6 posts)](/blog.html#category=CHRT) - Previous: [(4 of 6) CHRT.com - IaC via CloudFormation](/posts/2024/chrt/04_iac_cloudformation.html) - Next Post: [(6 of 6) CHRT.com - TBD](/posts/2024/chrt/06_tbd.html)