| Package | Description |
|---|---|
| us.codecraft.webmagic |
Main class "Spider" and models.
|
| us.codecraft.webmagic.downloader |
Downloader is the part that downloads web pages and store in Page object.
|
| us.codecraft.webmagic.pipeline |
Pipeline is the persistent and offline process part of crawler.
|
| us.codecraft.webmagic.processor |
PageProcessor custom part of a crawler for specific site.
|
| us.codecraft.webmagic.processor.example | |
| us.codecraft.webmagic.proxy | |
| us.codecraft.webmagic.scheduler |
Scheduler is the part of url management.
|
| us.codecraft.webmagic.scheduler.component |
Component of scheduler.
|
| us.codecraft.webmagic.utils |
Static utils of webmagic.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Request
Object contains url to crawl.
It contains some additional information. |
| ResultItems
Object contains extract results.
It is contained in Page and will be processed in pipeline. |
| Site
Object contains setting for crawler.
|
| Spider
Entrance of a crawler.
A spider contains four modules: Downloader, Scheduler, PageProcessor and Pipeline. Every module is a field of Spider. |
| Spider.Status |
| SpiderListener
Listener of Spider on page processing.
|
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Request
Object contains url to crawl.
It contains some additional information. |
| Site
Object contains setting for crawler.
|
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| ResultItems
Object contains extract results.
It is contained in Page and will be processed in pipeline. |
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Site
Object contains setting for crawler.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Site
Object contains setting for crawler.
|
| Class and Description |
|---|
| Page
Object storing extracted result and urls to fetch.
Not thread safe. Main method: Page.getUrl() get url of current page Page.getHtml() get content of current page Page.putField(String, Object) save extracted result Page.getResultItems() get extract results to be used in PipelinePage.addTargetRequests(java.util.List) Page.addTargetRequest(String) add urls to fetch |
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| Request
Object contains url to crawl.
It contains some additional information. |
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| Request
Object contains url to crawl.
It contains some additional information. |
| Task
Interface for identifying different tasks.
|
| Class and Description |
|---|
| Request
Object contains url to crawl.
It contains some additional information. |
Copyright © 2017. All rights reserved.