Startups Stack Exchange Archive

Collecting big data from distributed users’ softwares and re-centralized to youself

Lets say, a software vendor wants to publish a data analysis program / app, and that requires a lot of online data for machine learning to achieve good results for the users. Is it okay for the software to let users scrap small amount of Google/reddit/stackexchange search result data (with their own IP) , which then be uploaded / centralized back to my server through their installed software (given explicit user agreements), in order to bypass, for example, the Google API terms of use (eg request limit)?

Will that company or the users using the software violate any Terms of Use of Google API etc?

Alternately, if the software let users do manual browsing and just records the browser output in an automatic manner will this a perfectly legit way to scrap data without violating the Google etc. terms of service anymore?

If these methods are not suggested, what is a good way for startup to obtain the big data for commercial use (maybe for a particular category like Books / Cars) ?

No Answers

There were no answers to this question.


All content is licensed under CC BY-SA 3.0.