Stats modeling the world versus the practice of statistics

#Stats modeling the world versus the practice of statistics how to#
#Stats modeling the world versus the practice of statistics software#
#Stats modeling the world versus the practice of statistics code#

Now that I know I need information like the video's name and publication date with the stats, I can implement the Extended Reference Pattern. While we were developing our minimum viable product, I weighed the ease of development by avoiding data duplication against the potential performance impact of splitting our data. The $lookup operation required to join the data in the two collections isn't that complicated, but best practices suggest limiting $lookup as these operations can negatively impact performance. Even something that seems relatively simple like listing the video's name alongside the stats requires the use of $lookup. In the current data model, I can easily gather stats about likes, dislikes, views, etc, for a given video ID, but I will have to use $lookup to join the data with the youtube_videos collection in order to tell you anything more.

#Stats modeling the world versus the practice of statistics software#

And, to be completely honest, maintaining duplicate data was a little scary based on the time crunch we were under, and the lack of software development process we were following. I knew that if I were to duplicate the data, I would need to maintain the consistency of that duplicate data. I wasn't sure what data was going to need to be displayed alongside the stats (put another way, I wasn't sure what data was going to be accessed together), so I duplicated none of the data. For ease of development, I decided to store the information from those two API calls in separate collections. When I began figuring out how I was going to use the YouTube API and what data I could retrieve, I realized I would need to make two API calls: one to retrieve a list of videos with all of their metadata and another to retrieve the stats for those videos. We teach developers that duplicating data is OK, especially if you won't be updating it often. One of the rules of thumb when modeling data for MongoDB is data that is accessed together should be stored together. Duplicating data is scary-even for those of us who have been coaching others to do so Without further ado, let's jump into the seven things I learned while rapidly modeling YouTube data.

#Stats modeling the world versus the practice of statistics code#

If you'd like to take a peek at our code and learn more about our app, visit. I'll discuss some of the pros and cons of our data model throughout the rest of this post. To be clear, I'm not saying this was the best way to model our data this is the data model we ended up with after two weeks of rapid development.

But, before I jump into what I learned, I'll share a bit of context about how I modeled the data. In this post, I'll share seven things I learned while modeling the data for this app.

#Stats modeling the world versus the practice of statistics how to#

Mark handled the OAuth authentication, Max created the charts in the dashboard, and I was responsible for figuring out how to retrieve and store the YouTube stats. Mark, Max, and I each owned a piece of the project. Screenshot of the MongoDB Charts dashboard that contains charts about the videos our team has posted on YouTube In an effort to win brownie points with our management team and get in a little programming time, we worked together as a team of three over two weeks to rapidly develop an app that pulls daily stats from the YouTube API, stores them in a MongoDB Atlas database, and displays them in a MongoDB Charts dashboard. Our management team had been painfully pulling these stats every month in a complicated spreadsheet. We assume learners in this course have background knowledge equivalent to what is covered in the earlier three courses in this specialization: "Introduction to Probability and Data," "Inferential Statistics," and "Linear Regression and Modeling.Mark Smith, Maxime Beugnet, and I recently embarked on a project to automatically retrieve daily stats about videos on the MongoDB YouTube channel.

Additionally, the course will introduce credible regions, Bayesian comparisons of means and proportions, Bayesian regression and inference using multiple models, and discussion of Bayesian prediction. The course will apply Bayesian methods to several practical problems, to show end-to-end Bayesian analyses that move from framing the question to building models to eliciting prior probabilities to implementing in R (free statistical software) the final posterior distribution. You will learn to use Bayes’ rule to transform prior probabilities into posterior probabilities, and be introduced to the underlying theory and perspective of the Bayesian paradigm. This course describes Bayesian statistics, in which one's inferences about parameters or hypotheses are updated as evidence accumulates.