GitLake is a distributed data lake management framework based on Git. It defines a file system that is optimized to perform ETLM tasks within a data lake environment. It also provides a CLI tool gitlake which offers user a git-like experience to manage and share raw data files and perform massively parallel compute tasks.

  • Documentation: https://gitlake.readthedocs.io
  • Website: https://www.gitlake.com