Sub-issues
#6339 - Review SDD for #6315 [P&S Initiative] Per-user cost reporting with Grafana on AWS
#6396 - Establish collaboration on VEDA cost reporting with Tarashish (Development Seed)
#5344 - Work with NASA VEDA admin to enable cost allocation tags
#6391 - Cost monitoring backend can be installed as a standalone helm-chart repository (phase 1)
#6519 - Cost monitoring backend can be installed as a standalone helm-chart repository (phase 2)
#6280 - [Cloud costs backend] Home directory storage cost monitoring unavailable for EBS volumes
#6564 - Create a submodule to query usage metrics from Prometheus
#6565 - Obtain per-user compute usage with the cost monitoring backend
#6393 - Per-user costs can be calculated in the backend
#6589 - Write table test for pure functions to validate per user cost logic
#6567 - Deploy prototype on a test hub
#6568 - Design a Grafana dashboard panel to visualise per-user costs
#6658 - Write
jsonnetto encapsulate Grafana dashboard design as code#6649 - Iterate given lessons learned from User cost monitoring prototype
#6778 - Cost Monitoring MVP improvements
#6395 - Rollout per-user cost reporting to production
#492 - Document new feature: cloud cost reporting and monitoring (2i2c-org/2i2c-org.github.io)
Participate in the issue: github
Context¶
Cloud cost monitoring is an important tool for providing hub administrators with operational oversight of usage and budgets with transparency. This allows them to demonstrate value to funders and make informed decisions on impact and cost. Currently, we do not have a clear way to report the per-user costs of running a hub, which makes it difficult for hub administrators to the value and impact on a per-user basis.
Goal¶
The goal of this initiative is to provide a per-user cost reporting system for JupyterHubs running on AWS, using Grafana as the visualization tool. This will enable hub administrators to monitor and report the costs associated with each user, thereby enhancing transparency and accountability.
Out of scope¶
measuring “idle usage” (assigned or reserved but unused nodes) separately. That is an important way to reduce overall cost, but should be tracked separately.
trying to reduce cloud costs. The goal here is simply to do reporting. Reducing cloud costs can be their own separate initiative
GCP
enabling cost monitoring for GCP will be followed up in separate initiatives
workshop hubs and BinderHubs
the concept of a user does not exist for these types of hubs
GPU compute
usage monitoring for this type of compute is not currently in place
object storage
as described in Section Other components, this is not practically feasible for a flat storage environment
Software Design Document¶
https://hackmd.io/@jnywong/HJB8ewSree
Status: Done
Back to: All Initiatives