platformOS Community

Downloading Assets and Media Upload

Patrick - Combinate Jul 30 2021 at 02:16

Hello platformOS'ers!

We have a client asking for a backup of their site.

The pos-cli pull does not include the asset folder.
I also cannot see a way to download all S3 files related to media uploads.

Is there a process for this?

I am considering using a script to crawl the site and generate a static version
https://github.com/tj/staticgen

What we can do so far:

  • Code base with full change history (git)
  • Export of all data from the databases as a JSON file
  • Export of all modules folder code (excluding assets)
  • Pull of the App folder (excluding assets)
Pawel Aug 11 2021 at 14:34

We don't have a one button solution for it yet, but backup of the database has all the URLs to user media uploads, so it should be much easier and faster to run script that just iterates over JSON with URLs, than crawl/scrape the website with all the performance and security implications of that (most likely scraper IP will get throttled and subsequently blocked by out anti-DDoS protection within Amazon WAF).

JSON contains a lot of data so I would recommend using jq to limit data that downloading software will work on.
Having JSON with only URLs and ID of the row/record that it is associated to, simple node.js script, or bash with curl could be run on a server with fast connection. If the backup is meant to be done periodically, it can be run in crontab to automate things.

Of course it is not ideal and we are thinking about ideal (secure AND easy to use, which is often don't go hand in hand) way of providing backups for both assets, and user uploads. It is on our radar, and we internally have backups guaranteed, but exposing sensitive data (which user uploads can contain, including medical) has to be done with utmost care for security - there is no room for errors.

Please sign in or fill up your profile to answer a question