What discord data do you store

What discord data do you store

What discord data do you store

What discord data do you store. Смотреть фото What discord data do you store. Смотреть картинку What discord data do you store. Картинка про What discord data do you store. Фото What discord data do you store

What discord data do you store. Смотреть фото What discord data do you store. Смотреть картинку What discord data do you store. Картинка про What discord data do you store. Фото What discord data do you store

What discord data do you store. Смотреть фото What discord data do you store. Смотреть картинку What discord data do you store. Картинка про What discord data do you store. Фото What discord data do you store

Updated November 19, 2021

At Discord, the Data Platform team empowers the organization to analyze, understand, and leverage data to help Discord create space for everyone to find belonging. Discord uses data for a number of reasons: to identify bad actors and harmful communities; to develop insights that inform critical product and strategy decisions; and to train and assess the effectiveness of machine learning models (check out our Privacy Policy for more information on what information we collect and how we use it!). Without regular and rigorous analysis of how our product is used, our ability to make informed decisions about company strategy at scale would be severely lacking.

Raw data comes to us as production datastore exports and product telemetry data (over 15 trillion records so far and billions generated daily). When Discord was a smaller company and data use cases were simpler, it was somewhat tenable, if not ideal, to manually compute useful datasets as needed. Today, we process petabytes of data with 30,000 vCPUs in the cloud. To be useful, the raw data must be cleaned, privatized according to our data governance policies, and then transformed into a complex schema of thousands of precomputed tables in our 30+ petabyte data warehouse (we use Google BigQuery).

As of this writing, the part of the Data Platform team responsible for ingesting raw data and making it accessible consists of eight people and we were even fewer in number during the history described below. Given the team’s size relative to the rest of Discord, it was important to build a system that was self-serve and as automated as possible. This is the story of how we turned petabytes of raw data into a structured data warehouse and the system we built to maintain it, internally referred to as Derived.

‍

Requirements and Approach

What we needed was a system for maintaining a complex Directed Acyclic Graph (DAG) of precomputed data—in our case, this meant a DAG of derived tables in our BigQuery data warehouse:

What discord data do you store. Смотреть фото What discord data do you store. Смотреть картинку What discord data do you store. Картинка про What discord data do you store. Фото What discord data do you store

Though the system would be broken down into a series of deliverable milestones, we wanted the eventual system to meet the following requirements:

While existing solutions such as dbt, Airflow, and Looker solve for some of the above, we ultimately decided that we wanted a more custom solution that would integrate nicely with our existing systemsВ and give us the flexibility to extend to use cases beyond analytics.

We were already using Airflow to schedule batch jobs and to process simpler datasets, but we found the following limitations:

Taking into account our requirements, observing existing pain points, and drawing some inspiration from existing solutions, we made the following design choices:

Version One: The Minimum Viable Product

For the initial deliverable, the highest priority goals were to get data transformations into git, ensure that data was consistent across the warehouse, and simplify data operations. We built the following:

Table build behavior specified using one of three different strategies would instruct how tables are built, incremented, and backfilled:

Thus, Derived was born and fit into our architecture as illustrated below:

What discord data do you store. Смотреть фото What discord data do you store. Смотреть картинку What discord data do you store. Картинка про What discord data do you store. Фото What discord data do you store

Version Two: Ergonomics

The MVP proved the technology of constructing the DAG, building tables, and managing the data warehouse, but people internally struggled to create new Derived Tables without the help of Data Engineers because the process was still too complicated and obscure. So for the next iteration, we focused on creating a simple user interface for people to easily create new tables and write documentation right alongside their code.

Another benefit of adopting this standardized interface is that it provides an abstraction layer for us to rapidly iterate on the systems underlying the configuration without impacting teams.

Version Three: Automation

Version Two successfully unlocked our Data Science teams to create tables without assistance, and they created hundreds of tables within the first year. With this success emerged a new set of problems:

Version Three therefore focused on improving the reliability of deployments and automating the rebuilding/repairing of Derived tables. To accomplish this, we focused on ergonomics, testing, and general automation:

Testing:

We wanted people to be able to test while developing new tables, so we implemented the following:

Automation:

In Version Two of Derived, the table’s metadata was tracked in Airflow, resulting in a number of manual steps during data maintenance operations (e.g. a backfill required pausing the DAG, running the operation, and then syncing the actual state of the table with Airflow metadata).

To automate data operations we moved table state tracking out of Airflow and into a metadata log so that Derived could independently decide when to repair, rebuild, and add data to tables.В

More detailed state tracking at the table level also unlocks parallel computations so that a parent process doesn’t block while sequencing and scheduling 900+ tables, all tables can run concurrently and as frequently as desired to keep derived insights consistent across the data warehouse and up-to-date with data sources. Each table updater is deployed as its own Kubernetes Pod: when a pod starts up, it runs through the following steps:

What discord data do you store. Смотреть фото What discord data do you store. Смотреть картинку What discord data do you store. Картинка про What discord data do you store. Фото What discord data do you store

The metadata log is available in BigQuery and enables detailed monitoring, performance analysis, and data lineage. ItВ answers monitoring questions like When was the table last updated? How recent is the data in the table? For performance analysis, we join the metadata log to the BigQuery information_schema for query execution details; and to report on metrics for each table. Data lineage can be obtained from the metadata log by tracking predecessor dependencies when tables are updated, so the entire lineage can be re-constructed by traversing the metadata log.

Powering Discord Features:

Up until now, Derived operated only on BigQuery datasets (a data warehouse designed for big data processing) that frequently has query response times greater than one second. In order to power application features the response times needed to be much faster, especially for machine learning features where the application flow is: receive a user request, query multiple Derived datasets to create a feature set, make a prediction and respond to the user within one second. For this we added a new configuration option on Derived to automatically export from BigQuery to Scylla so that the Derived dataset would be available in a database designed for high-performance queries in online systems.

Conclusion

We’ve been running Version Three in production for over a year now and have accomplished the original seven goals we set out to achieve.

вњ”пёЏ Table updates should run as soon as new data is available (but no sooner!)
вњ”пёЏ Maintains an audit trail of mutations to derived datasets.
вњ”пёЏ Includes primitives for powering data lineage and data catalog tooling.
вњ”пёЏ Modifications to the DAG should be self-serve and intuitive for stakeholder teams like engineering, data science, and machine learning.
вњ”пёЏ Aware of data access controls and provides scalable data governance policy enforcement.
✔️ Able to automatically export derived data to production datastores for use in Discord’s user-facing product.
✔️ Simple and easy to operate in the context of Discord’s environment.

Whew! That was a lot of information and quite the adventure for the team! If working with massive data sets strikes a chord with you, we invite you to check out our jobs page and apply!

Discord Data Usage: How Much?

Here’s how much data Discord uses:

On your desktop or PC, Discord will use more data than on your phone.

If you want to learn all about how much data Discord uses and how to reduce the usage, then this article is for you.

Let’s jump right in!

How Much Data Does Discord Use?

Discord allows you to connect with friends, family, and colleagues, just like Zoom, but in a more organized way.

It also links with third-party apps like Spotify so that you can bring music and video-streaming to your experience.

Many users have found Discord especially helpful for making international and long-distance calls because it acts as a VOIP.

VOIP stands for Voice Over Internet Protocol.

A VOIP allows you to make phone calls without an analog phone line; it runs on data instead.

What discord data do you store. Смотреть фото What discord data do you store. Смотреть картинку What discord data do you store. Картинка про What discord data do you store. Фото What discord data do you store

In fact, Discord, like all messaging apps, requires data to run all of its features.

To share music, send video, start a voice chat, or simply text message another user, you’ll need access to a decent amount of data.

So, how much data does Discord use?

Well, the answer isn’t all that straightforward. There are multiple factors at play.

To understand, let’s zoom out just a bit and talk about data and how it works on your phone versus your personal computer.

Discord Data on PCs vs. Phones

Cellular or mobile data uses radio waves to connect to the internet just like Wi-Fi does.

However, Wi-Fi covers a limited area, like your house or office building.

Mobile data relies on cell towers to connect you to the internet no matter where you are.

So, it has a broad range and can connect you with the internet even when you’re far from your home Wi-Fi network.

Data plans are priced based on speed and the number of gigabytes they offer per month.

Though unlimited plans are becoming more popular, many stick with less expensive plans that charge by data usage.

Often, such plans cap the number of gigabytes you can use and charge any overage at a premium.

Your computer uses data too, but unless you’re using your phone as a hotspot, it’s probably not connected to a limited data plan.

Most websites recognize whether you’re browsing via your desktop or a smartphone.

So, they’ll customize the data packages that are sent accordingly.

So, how much data does Discord use on a PC?

It’s hard to be exact, but we can say Discord will use more data when running its desktop or PC version than it would on your phone.

When you use a desktop, more data comes through to display better quality images, video, and text.

When you use a mobile device, many websites tailor the amount of data you receive down to a minimum out of respect for mobile plans’ data caps.

How Much Data Does Discord Use an Hour?

How much data a particular app, like Discord, uses in an hour can be difficult to pinpoint.

The amount of data you use will depend on what you’re trying to do, what’s going on in the background, and how fast your internet connection speed is.

It makes sense that sending a picture or video requires more data than a text message.

Larger group chats and screen sharing add to data usage as well.

So, what you’re attempting to do on Discord has the biggest effect on data usage.

Phones, tablets, and computers also tend to be running multiple applications at once.

Even if they’re not open, applications may be retrieving location data or synchronizing content.

Background applications running while you use Discord will significantly affect the total amount of data you burn through.

You can shut down background applications by adjusting your phone settings, which may be ideal if data usage is becoming a problem.

With faster internet connection speeds, you end up using more data because you can complete tasks faster.

Essentially, you won’t send as many pictures to a friend if they take hours to download.

With good internet speeds, you’re more likely to use more data sending additional photos and video.

Discord Data Usage on Phones

Now that we understand data a little better, let’s take a look at Discord data usage on phones for voice calls, video calls, and texts.

In general, it tends to be lower than other messaging app options.

But keep in mind, this is highly variable, and your data usage may differ from the amounts we’re guesstimating.

How Much Data Does Discord Text Use?

Each time you send a text message through Discord, you send a full HTTP POST request.

That doesn’t mean much to most of us, but know that it’s more data than you might think.

Data usage for text messaging in Discord depends on how many people are chatting at once.

For simplicity’s sake, let’s stick to a conversation between two people.

How Much Data Does a Discord Voice Call Use?

In general, Discord uses less data than many other apps when it comes to audio calls.

What discord data do you store. Смотреть фото What discord data do you store. Смотреть картинку What discord data do you store. Картинка про What discord data do you store. Фото What discord data do you store

If you’re asking how much mobile data does Discord voice chat use, the answer seems to be around 28 MB per hour.

Keep in mind that it could vary significantly based on background apps your phone may be running.

You can rest assured that Discord uses less data than other apps like Skype because of the way it transmits information.

While applications like Skype are continuously transmitting user data, Discord only transmits data for the person speaking, significantly reducing the amount of data transfer overall.

How Much Data Does Discord Use for Video Calls?

In general, a video call will run about 270 MB per hour.

As noted above, though, that number will vary significantly depending on the other applications you’re running on your device.

Again, since Discord only transmits audio data while you’re speaking, rather than throughout the call, you save a little bit of data using Discord over other messaging applications.

Discord Data Usage for Music and Streaming

If you’re wondering how much data does Discord music use, or how much data does Discord streaming use, the answer depends on the service you’re using.

What discord data do you store. Смотреть фото What discord data do you store. Смотреть картинку What discord data do you store. Картинка про What discord data do you store. Фото What discord data do you store

Discord works well with several third-party apps, but which 3rd-party application you use plays a huge role in data usage.

Spotify, for example, uses about 43.2 MB per hour of music. Streaming music and video on YouTube, though, takes 150MB per hour at the lowest quality setting.

At higher quality settings, you could use over 1 GB per hour, which is a lot of data!

How to Minimize Discord Data Usage?

If you keep running out of data but rely on Discord to reach friends, family, and internet-based groups, learning how to minimize your data usage is helpful.

As mentioned, you can start by pausing or disabling unnecessary background apps.

How to do that varies based on your phone, but both Apple and Android allow it, typically from the settings menu.

In group chats, you can also disable link previews, at least on the messages you send.

Link previews force your phone to download images and text you may have no interest in reading.

It can add up to a lot of data in group chats where people are sending links back and forth.

Currently, there’s no way to disable the link previews when you’re on the receiving end, but users have requested that Discord consider that for future updates.

Does Discard Use a Lot of Data?

So, does Discord use a lot of data on a phone?

Does it use a lot of data on tablets?

Does Discord use a lot of data in general?

The answer to all of the above is that it depends.

Discord uses less data than other popular messaging apps, like Skype. But, it still uses a lot of data, depending on what you use the app for.

If you’re using it to send text messages between two people, the data usage is negligible.

However, if you’re in a group chat where everyone is sending pictures back and forth, the amount of data required could be much higher.

It will be higher still if you’re trying to use voice chat, video chat, or screen sharing.

And sharing music, images, and video will use the most data of all.

Discord is a great way to keep in touch and a fun way to interact with new communities.

But if you’re asking how much data does Discord use, the answer isn’t straightforward.

Your best bet is to track the data usage on your particular device while using the app.

Then you’ll have an idea of how much data Discord uses on your specific device for future reference.

Androz2091/discord-data-package-explorer

Use Git or checkout with SVN using the web URL.

Work fast with our official CLI. Learn more.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

What’s really in your Discord Data package? And how can this data be useful? Discord Data Package Explorer does the job for you!

What discord data do you store. Смотреть фото What discord data do you store. Смотреть картинку What discord data do you store. Картинка про What discord data do you store. Фото What discord data do you store

Discord Data Package Explorer is built with Svelte, and is quite easy to install.

Note: for testing purposes, you may append use this link to use the mocked data.

This project was created after a discussion with Kaki87, an awesome developer who taught me a lot of things. He had in mind to create this app but didn’t have the time for it. I decided to learn Svelte and build it as a side project 🙂 This project would never have existed without him! 🙏

About

🌀 What’s really in your Discord Data package?

What discord data do you store

Copy raw contents

Copy raw contents

What does your application do? Please be as detailed as possible, and feel free to include links to image or video examples.

AutoDelete is a Discord bot that will automatically delete messages from a designated channel.

After adding the bot to a server, users with the ‘Manage Messages’ permission can interact with and configure the bot. The bot is configured by chat commands, consisting of an @-mention of the bot followed by the command. Configuration is per-channel, not per-guild.

The configuration calculates an expiration condition for every message sent in the channel, either through time or through a message count limit being reached. When the oldest message in a channel expires, it and any other messages expiring in the next few seconds are bulk-deleted from Discord.

The end. That’s all it does 🙂 The rest is just details!

Tell us more about the data you store and process from Discord.

What Discord data do you store?

The text of all messages @-mentioning the bot are retained for a limited time for support and product improvement purposes.

The text of all messages using the adminhelp command are forwarded to another Discord channel for human inspection. Discord storage policies then take over.

The bot-specific configuration that a user sets is retained indefinitely, in connection with the channel and guild ID the configuration is for. This data also carries ext4 file modification timestamps.

Role membership in the support guild is interrogated to determine if a user is a donor to the bot via Patreon. The results of this query (yes or no) are saved to the configuration file.

No other message content is ever retained. Messages in channels configured for automatic deletion are stripped down to just their ID and timestamp after being processed.

The list of «live» message IDs and precise timestamps in a channel that will be deleted by the bot at their expiration date are retained in memory only for the duration that those messages remain undeleted on Discord.

The list of pinned messages in a channel configured for automatic deletion is requested by the bot and maintained in memory for as long as the channel remains configured for automatic deletion.

Aggregated statistics are exported to a Prometheus monitoring server. Disaggregated statistics that identify particular guild and channel IDs are exported to a Prometheus monitoring server when, and only when, per-channel usage thresholds are exceeded. (This protects both the privacy of low-volume users, as well as my RAM.) All Prometheus statistics are collected at an effective time granularity no less than 1 minute apart (it’s 5 minutes).

For what purpose(s) do you store it?

See earlier answers for detailed descriptions of the data categories.

The following purposes are used below:

Message IDs, etc: Essential Log lines: Operational and debugging Structured logging: Product improvements, usage analysis @-mention content: Support Metrics: Operational, debugging, product improvements, usage analysis, support Configuration: Essential, usage analysis Patreon integration query results: Same as Configuration adminhelp content: Support

For how long do you store it?

See earlier answers for detailed descriptions.

Message IDs, etc: Volatile, process lifetime Log lines: Ephermal,

What is the process for users to request deletion of their data?

Feels a bit odd, as deleting data is the entire purpose of the bot!

A server administrator with ‘Manage Messages’ can delete the configuration for a channel by using the @AutoDelete set 0 invocation.

A server administrator with ‘Manage Server’ can delete all configuration data for a guild by kicking the bot. After an indeterminate amount of time (hint: it’s the next websocket gateway reconnect), the bot will notice it no longer has access to the channel, and automatically delete the configuration data.

No process is implemented for deletion of log data, as this data is retained for less than 30 days.

For deletion of ‘adminhelp’ command content, users can make a request through the same channels as listed in the ‘security issues’ section.

Tell us more about your application’s infrastructure and your team’s security practices.

What systems and infrastructure do you use?

The bot runs on one (or more) droplets on DigitalOcean.

(looks at recommended shard count again) Yup, the multi-droplet split is coming up real soon now.

How have you secured access to your systems and infrastructure?

Access to the droplet is only possible through key-authenticated SSH. SSH private key files are protected using passphrases of at least 15 characters in length, or other mechanisms derived in the future that are at least as secure.

Code deploys are performed over SSH and always pull the integrity-checked code from Github, no manual uploads of binaries or source to the server are performed.

(quickly goes to delete the localhost:8000 OAuth redirect url. I have a test bot instance for that stuff now)

How can users contact you with security issues?

Does your application utilize other third-party auth services or connections? If so, which, and why?

The Patreon integration is used exclusively via querying the Discord side of the integration.

No third-party services are actively contacted during bot operations, and no other third-party services are used.

Privileged Gateway Intents

Maintaining a stateful application can be difficult when it comes to the amount of data you’re expected to process, especially at scale. Gateway Intents are a system to help you lower that computational burden. Some of these gateway intents are defined as «Privileged» due to the sensitive nature of the data they grant, and access can be enabled below.

Which intents are you applying for, if any? (Leave blank if you do not need any of these)

jakobbouchard / discord-verification-rundown.md

Quick rundown of the Discord bot verification situation.

Remember, there’s not rush to get verified before October 7th, since verification takes about 5 days anyways. If you are in less than a 100 servers, no need to worry, your bot will still work as it does currently. Keep in mind some stuff is NOT official, so unless there’s a source, take it with a grain of salt.

What are Privileged Intents? Source

GUILD_MEMBERS lets you catch events when people update themselves or join/leave a server. GUILD_PRESENCES lets you catch status updates such as online, idle & games/custom statuses.

Non-exhaustive list of accepted IDs Source

It depends from country to country, but in general, you’ll want a valid, government-issued photo id. This can be a driver’s license, passport, federal ID, or something else in that category. The specifics to where you are located may be different. If you are required to get verified, Stripe will let you know during the process if your ID is OK or not. For a list for countries other than the US, see here: https://stripe.com/docs/connect/identity-verification-api#acceptable-id-types. However, this is NOT EXHAUSTIVE OF ALL THE COUNTRIES THEY SUPPORT.

If you don’t have an ID, contact support, they’ll help. Also apparently, in the U.S. you can get a state ID with parental conscent.

Questions about the verification form

WHAT SYSTEMS AND INFRASTRUCTURE DO YOU USE?

How do you run your bot?

HAVE YOU SECURED ACCESS TO YOUR SYSTEMS AND INFRASTRUCTURE?

Is your server secure? Can anybody access that data?

DOES YOUR APPLICATION UTILIZE OTHER THIRD-PARTY AUTH SERVICES OR CONNECTIONS? IF SO, WHICH, AND WHY?

The third party services are more like «Does your bot also do Twitch OAuth so you can connect Discord to Twitch?» Source

All the stuff about storing data

If you do not store any data, just write I don’t store data. If you store data about e.g. levels and currencies, you should write that. Since data is anonymized, it’s most probably fine. Anyways, lying would get you in even more trouble. When in doubt include it.

From what I understand, end user data is data generated by users (like their username, message content, etc). Snowflakes aren’t because they are generated by Discord.

Does storing data count if it’s logging to a text channel?

No, it’s literally just sending Discord’s data back to Discord. It’s fine.

WHY DOES DISCORD GET ACCESS TO MY ID.

Probably because they need to verify that Stripe is not lying to them about the verification. They specifically said that they don’t keep any data.

Any suggestions or questions are appreciated!

Источники информации:

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *