- add rate limiting
- update readme
This commit is contained in:
parent
eef58c9c1b
commit
30c093eaf9
40
README.md
40
README.md
@ -17,14 +17,12 @@ Quickly scrape Google Search Results.
|
|||||||
package main
|
package main
|
||||||
|
|
||||||
import (
|
import (
|
||||||
"context"
|
|
||||||
"fmt"
|
"fmt"
|
||||||
"github.com/rocketlaunchr/google-search"
|
"github.com/rocketlaunchr/google-search"
|
||||||
)
|
)
|
||||||
|
|
||||||
func main() {
|
func main() {
|
||||||
ctx := context.Background()
|
fmt.Println(googlesearch.Search(nil, "cars for sale in Toronto, Canada"))
|
||||||
fmt.Println(googlesearch.Search(ctx, "cars for sale in Toronto, Canada"))
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
@ -53,14 +51,44 @@ func main() {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
## Warning
|
## :warning: Warning
|
||||||
|
|
||||||
The implementation relies on Google's search page DOM being constant. From time to time, Google changes their DOM and thus breaks the implementation.
|
The implementation relies on Google's search page DOM being constant. From time to time, Google changes their DOM and thus breaks the implementation.
|
||||||
|
|
||||||
In the event it changes, this package will be updated as soon as possible.
|
In the event it changes, this package will be updated as soon as possible.
|
||||||
|
|
||||||
Also note, that if you call this function too quickly, Google detects that it is being scraped and produces a [recaptcha](https://www.google.com/recaptcha/intro/v3.html) which interferes with the scraping. **Don't call it in quick succession.**
|
Also note, that if you call this function too quickly, Google detects that it is being scraped and produces a [recaptcha](https://www.google.com/recaptcha/intro/v3.html) which interferes with the scraping. **Don't call it in quick succession. It may take some time before Google unlocks you.**
|
||||||
|
|
||||||
|
You can try the built-in [rate-limiter](https://godoc.org/github.com/rocketlaunchr/google-search#RateLimit).
|
||||||
|
|
||||||
|
<details>
|
||||||
|
<summary>Further Details</summary>
|
||||||
|
|
||||||
|
<svg width="100" height="100" xmlns="http://www.w3.org/2000/svg">
|
||||||
|
<foreignObject width="100" height="100">
|
||||||
|
<div xmlns="http://www.w3.org/1999/xhtml">
|
||||||
|
<div style="font-family: arial, sans-serif; background-color: #fff; color: #000; padding:20px; font-size:18px;" onload="e=document.getElementById('captcha');if(e){e.focus();}">
|
||||||
|
<div style="max-width:400px;">
|
||||||
|
<hr noshade size="1" style="color:#ccc; background-color:#ccc;"><br>
|
||||||
|
<hr noshade size="1" style="color:#ccc; background-color:#ccc;">
|
||||||
|
|
||||||
|
<div style="font-size:13px;">
|
||||||
|
<b>About this page</b><br><br>
|
||||||
|
|
||||||
|
Our systems have detected unusual traffic from your computer network. This page checks to see if it's really you sending the requests, and not a robot. <a href="#" onclick="document.getElementById('infoDiv').style.display='block';">Why did this happen?</a><br><br>
|
||||||
|
|
||||||
|
<div id="infoDiv" style="display:none; background-color:#eee; padding:10px; margin:0 0 15px 0; line-height:1.4em;">
|
||||||
|
This page appears when Google automatically detects requests coming from your computer network which appear to be in violation of the <a href="//www.google.com/policies/terms/">Terms of Service</a>. The block will expire shortly after those requests stop. In the meantime, solving the above CAPTCHA will let you continue to use our services.<br><br>This traffic may have been sent by malicious software, a browser plug-in, or a script that sends automated requests. If you share your network connection, ask your administrator for help — a different computer using the same IP address may be responsible. <a href="//support.google.com/websearch/answer/86640">Learn more</a><br><br>Sometimes you may be asked to solve the CAPTCHA if you are using advanced terms that robots are known to use, or sending requests very quickly.
|
||||||
|
</div>
|
||||||
|
|
||||||
|
IP address: xxx.xx.xxx.xx<br>Time: 2021-01-13T05:27:34Z<br>URL: https://www.google.com/search?q=Hello+World&hl=en&num=20<br>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</foreignObject>
|
||||||
|
</svg>
|
||||||
|
</details>
|
||||||
|
|
||||||
|
|
||||||
## Credits
|
## Credits
|
||||||
@ -71,6 +99,7 @@ Special thanks to [Edmund Martin](https://edmundmartin.com/scraping-google-with-
|
|||||||
Other useful packages
|
Other useful packages
|
||||||
------------
|
------------
|
||||||
|
|
||||||
|
- [awesome-svelte](https://github.com/rocketlaunchr/awesome-svelte) - Resources for killing react
|
||||||
- [dataframe-go](https://github.com/rocketlaunchr/dataframe-go) - Statistics and data manipulation
|
- [dataframe-go](https://github.com/rocketlaunchr/dataframe-go) - Statistics and data manipulation
|
||||||
- [dbq](https://github.com/rocketlaunchr/dbq) - Zero boilerplate database operations for Go
|
- [dbq](https://github.com/rocketlaunchr/dbq) - Zero boilerplate database operations for Go
|
||||||
- [electron-alert](https://github.com/rocketlaunchr/electron-alert) - SweetAlert2 for Electron Applications
|
- [electron-alert](https://github.com/rocketlaunchr/electron-alert) - SweetAlert2 for Electron Applications
|
||||||
@ -78,3 +107,4 @@ Other useful packages
|
|||||||
- [mysql-go](https://github.com/rocketlaunchr/mysql-go) - Properly cancel slow MySQL queries
|
- [mysql-go](https://github.com/rocketlaunchr/mysql-go) - Properly cancel slow MySQL queries
|
||||||
- [react](https://github.com/rocketlaunchr/react) - Build front end applications using Go
|
- [react](https://github.com/rocketlaunchr/react) - Build front end applications using Go
|
||||||
- [remember-go](https://github.com/rocketlaunchr/remember-go) - Cache slow database queries
|
- [remember-go](https://github.com/rocketlaunchr/remember-go) - Cache slow database queries
|
||||||
|
- [testing-go](https://github.com/rocketlaunchr/testing-go) - Testing framework for unit testing
|
||||||
|
5
go.mod
5
go.mod
@ -2,4 +2,7 @@ module github.com/rocketlaunchr/google-search
|
|||||||
|
|
||||||
go 1.12
|
go 1.12
|
||||||
|
|
||||||
require github.com/gocolly/colly/v2 v2.0.1
|
require (
|
||||||
|
github.com/gocolly/colly/v2 v2.0.1
|
||||||
|
golang.org/x/time v0.0.0-20201208040808-7e3f01d25324
|
||||||
|
)
|
||||||
|
10
limit.go
Normal file
10
limit.go
Normal file
@ -0,0 +1,10 @@
|
|||||||
|
package googlesearch
|
||||||
|
|
||||||
|
import "golang.org/x/time/rate"
|
||||||
|
|
||||||
|
// RateLimit sets a global limit to how many requests to Google Search can be made in a given time interval.
|
||||||
|
// The default is unlimited (but obviously Google Search will block you temporarily if you do too many
|
||||||
|
// calls too quickly).
|
||||||
|
//
|
||||||
|
// See: https://godoc.org/golang.org/x/time/rate#NewLimiter
|
||||||
|
var RateLimit = rate.NewLimiter(rate.Inf, 0)
|
BIN
screenshot.png
BIN
screenshot.png
Binary file not shown.
Before Width: | Height: | Size: 16 KiB After Width: | Height: | Size: 21 KiB |
@ -256,6 +256,13 @@ type SearchOptions struct {
|
|||||||
|
|
||||||
// Search returns a list of search results from Google.
|
// Search returns a list of search results from Google.
|
||||||
func Search(ctx context.Context, searchTerm string, opts ...SearchOptions) ([]Result, error) {
|
func Search(ctx context.Context, searchTerm string, opts ...SearchOptions) ([]Result, error) {
|
||||||
|
if ctx == nil {
|
||||||
|
ctx = context.Background()
|
||||||
|
}
|
||||||
|
|
||||||
|
if err := RateLimit.Wait(ctx); err != nil {
|
||||||
|
return nil, err
|
||||||
|
}
|
||||||
|
|
||||||
c := colly.NewCollector(colly.MaxDepth(1))
|
c := colly.NewCollector(colly.MaxDepth(1))
|
||||||
if len(opts) == 0 {
|
if len(opts) == 0 {
|
||||||
|
Loading…
x
Reference in New Issue
Block a user