User 5195 | 5/13/2016, 9:15:23 PM

While using scatter plot to view the home_data from ML course offered through coursera using the following command

graphlab.canvas.settarget('ipynb')"Scatter Plot", x="sqftliving", y="price")

I use windows 10 When I execute the same command multiple times multiple plots which are not coherent with the previous ones are plotted. I am attaching two such plots. Each one is different from the other whenever I execute it again. What might be the reason? The scatter plots cannot change with time right?


User 5179 | 5/14/2016, 2:00:59 PM

They are the same, but the scaling is different. Top plot max Price - 3 mil. Bottom plot max price - 6 mil.

User 4 | 5/15/2016, 6:41:58 PM

Hi @kowshik0808 and @Demir_Tonchev,

The reason the plot renders differently each time is actually because it's sampling a different random subset of data points each time. We implemented scatter plot this way for performance reasons -- with a large dataset, rendering a circle for each individual point of data is extremely slow or even impossible. The technique currently being used is a random streaming sample (reservoir sample). The plot is displayed based on a random 1,000 points of data, and each call to show selects a different random 1,000 points, so the ranges (min/max) may be different.

We have received feedback that this type of random sampling is unintuitive and doesn't provide the expected results, so we're currently working on a different technique (which won't discard data points) to solve the performance issues, which will be available in a future release. In the meantime if you need a consistent scatter plot (especially one that preserves all original data points) please use another plotting library like matplotlib or seaborn.

User 5179 | 5/15/2016, 9:03:17 PM

@Zach Thanks for the info! I though @kowshik0808 is just resampling new train set and plotting it. Thanks for the elaboration of how the current plotting works.