Untrusted Code
- One hairy technical problem with having an AI-powered analytics agent is that you end up running lots of untrusted code
- To do analytics, we feed an LLM some prompts and ask it to write code which we then run on our servers
- Unfortunately, no amount of prompt scanning or static code scanning can verify that the code isn’t going to do something malicous.
- Plus, we started implementing the “code edits” feature which would allow users to directly write code and send it to our servers for execution so…yeah.
Good approach and hacky approach
- I think the good approach would be to have a setup that sends
- Well…something pretty close to that is the hacky approach I came up with - use Lambda functions. I think there can (and will and maybe already is?) be a great solution built on the idea of using a Micro VM like Firecracker to spin up many tiny execution environments
- But with Lambda as it isnow, there’s a problem…
The need for individual Lambda Functions
- Lambda functions use the same runtime
- But individual Lambda functions never share an environment
- A Lambda function also has no way to change or limit its IAM permissions
- We need the lambda to reach out to an S3 bucket to get the user’s data file
- If a Lambda can’t change or limit its IAM permissions during its runtime, then it must have permission to access all user folders and files
- And we can’t trust the code being run
- So we couldn’t use the same Lambda
- But we could, and did, do is - spawn a new Lambda function + IAM role for each using thanks to CloudFormation
The workflow
- Our auth provider is Firebase with Google Cloud Identity Platform
- Like most auth providers, they have a function/hook interface for running code when a user signs up
- The Lambda function code is developed using the Lambda Python runtime Docker image
- The custom image is stored in ECR
- When a user signs up, we have our auth provider (Firebase) run a function which sends their user_id to our server
- The server then provisions an identical Lambda Function based on our existing CloudFormation template - but with an IAM policy allowing that function to access the user’s S3 directory
- Then, during CI/CD, GitHub Actions loops over the list of Lambda Function names to make each of them update to the new latest image
Concerns
- Really just a pain to set up, but works fine
- Cold starts weren’t so bad
- Full Series: CHRT - Autopilot for Analytics (6 posts)
- Previous: (4 of 6) CHRT.com - IaC via CloudFormation
- Next Post: (6 of 6) CHRT.com - TBD