mirror of
https://github.com/LCTT/TranslateProject.git
synced 2024-12-26 21:30:55 +08:00
commit
8ef11661cb
@ -1,100 +0,0 @@
|
||||
[#]: collector: (lujun9972)
|
||||
[#]: translator: (HankChow)
|
||||
[#]: reviewer: ( )
|
||||
[#]: publisher: ( )
|
||||
[#]: url: ( )
|
||||
[#]: subject: (Top hacks for the YaCy open source search engine)
|
||||
[#]: via: (https://opensource.com/article/20/2/yacy-search-engine-hacks)
|
||||
[#]: author: (Seth Kenlon https://opensource.com/users/seth)
|
||||
|
||||
Top hacks for the YaCy open source search engine
|
||||
======
|
||||
Rather than adapting to someone else's vision, customize you search
|
||||
engine for the internet you want with YaCY.
|
||||
![Browser of things][1]
|
||||
|
||||
In my article about [getting started with YaCy][2], I explained how to install and start using the [YaCy][3] peer-to-peer search engine. One of the most exciting things about YaCy, however, is the fact that it's a local client. Each user owns and operates a node in a globally distributed search engine infrastructure, which means each user is in full control of how they navigate and experience the World Wide Web.
|
||||
|
||||
For instance, Google used to provide the URL google.com/linux as a shortcut to filter searches for Linux-related topics. It was a small feature that many people found useful, but [topical shortcuts were dropped][4] in 2011.
|
||||
|
||||
YaCy makes it possible to customize your search experience.
|
||||
|
||||
### Customize YaCy
|
||||
|
||||
Once you've installed YaCy, navigate to your search page at **localhost:8090**. To customize your search engine, click the **Administration** button in the top-right corner (it may be concealed in a menu icon on small screens).
|
||||
|
||||
The admin panel allows you to configure how YaCy uses your system resources and how it interacts with other YaCy clients.
|
||||
|
||||
![YaCy profile selector][5]
|
||||
|
||||
For instance, to configure an alternative port and set RAM and disk usage, use the **First steps** menu in the sidebar. To monitor YaCy activity, use the **Monitoring** panel. Most features are discoverable by clicking through the panels, but here are some of my favorites.
|
||||
|
||||
### Search appliance
|
||||
|
||||
Several companies have offered [intranet search appliances][6], but with YaCy, you can implement it for free. Whether you want to search through your own data or to implement a search system for local file shares at your business, you can choose to run YaCy as an internal indexer for files accessible over HTTP, FTP, and SMB (Samba). People in your local network can use your personalized instance of YaCy to find shared files, and none of the data is shared with users outside your network.
|
||||
|
||||
### Network configuration
|
||||
|
||||
YaCy favors isolation and privacy by default. You can adjust how you connect to the peer-to-peer network in the **Network Configuration** panel, which is revealed by clicking the link located at the top of the **Use Case & Account** configuration screen.
|
||||
|
||||
![YaCy network configuration][7]
|
||||
|
||||
### Crawl a site
|
||||
|
||||
Peer-to-peer indexing is user-driven. There's no mega-corporation initiating searches on every accessible page on the internet, so a site isn't indexed until someone deliberately crawls it with YaCy.
|
||||
|
||||
The YaCy client provides two options to help you help crawl the web: you can perform a manual crawl, and you can make YaCy available for suggested crawls.
|
||||
|
||||
![YaCy advanced crawler][8]
|
||||
|
||||
#### Start a manual crawling job
|
||||
|
||||
A manual crawl is when you enter the URL of a site you want to index and start a YaCy crawl job. To do this, click the **Advanced Crawler** link in the **Production** sidebar. Enter one or more URLs, then scroll to the bottom of the page and enable the **Do remote indexing** option. This enables your client to broadcast the URLs it is indexing, so clients that have opted to accept requests can help you perform the crawl.
|
||||
|
||||
To start the crawl, click the **Start New Crawl Job** button at the bottom of the page. I use this method to index sites I use frequently or find useful.
|
||||
|
||||
Once the crawl job starts, YaCy indexes the URLs you enter and stores the index on your local machine. As long as you are running in senior mode (meaning your firewall permits incoming and outgoing traffic on port 8090), your index is available to YaCy users all over the globe.
|
||||
|
||||
#### Join in on a crawl
|
||||
|
||||
While some very dedicated YaCy senior users may crawl the internet compulsively, there are a _lot_ of sites out there in the world. It might seem impossible to match the resources of popular spiders and bots, but because YaCy has so many users, they can band together as a community to index more of the internet than any one user could do alone. If you activate YaCy to broadcast requests for site crawls, participating clients can work together to crawl sites you might not otherwise think to crawl manually.
|
||||
|
||||
To configure your client to accept jobs from others, click the **Advanced Crawler** link in the left sidebar menu. In the **Advanced Crawler** panel, click the **Remote Crawling** link under the **Network Harvesting** heading at the top of the page. Enable remote crawls by placing a tick in the checkbox next to the **Load** setting.
|
||||
|
||||
![YaCy remote crawling][9]
|
||||
|
||||
### YaCy monitoring and more
|
||||
|
||||
YaCy is a surprisingly robust search engine, providing you with the opportunity to theme and refine your experience in nearly any way you could want. You can monitor the activity of your YaCy client in the **Monitoring** panel, so you can get an idea of how many people are benefiting from the work of the YaCy community and also see what kind of activity it's generating for your computer and network.
|
||||
|
||||
![YaCy monitoring screen][10]
|
||||
|
||||
### Search engines make a difference
|
||||
|
||||
The more time you spend with the Administration screen, the more fun it becomes to ponder how the search engine you use can change your perspective. Your experience of the internet is shaped by the results you get back for even the simplest of queries. You might notice, in fact, how different one person's "internet" is from another person's when you talk to computer users from a different industry. For some people, the web is littered with ads and promoted searches and suffers from the tunnel vision of learned responses to queries. For instance, if someone consistently searches for answers about X, most commercial search engines will give weight to query responses that concern X. That's a useful feature on the one hand, but it occludes answers that require Y, even though that might be the better solution for a specific task.
|
||||
|
||||
As in real life, stepping outside a manufactured view of the world can be healthy and enlightening. Try YaCy, and see what you discover.
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/20/2/yacy-search-engine-hacks
|
||||
|
||||
作者:[Seth Kenlon][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[译者ID](https://github.com/译者ID)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/seth
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/browser_desktop_website_checklist_metrics.png?itok=OKKbl1UR (Browser of things)
|
||||
[2]: https://opensource.com/article/20/2/open-source-search-engine
|
||||
[3]: https://yacy.net/
|
||||
[4]: https://www.linuxquestions.org/questions/linux-news-59/is-there-no-more-linux-google-884306/
|
||||
[5]: https://opensource.com/sites/default/files/uploads/yacy-profiles.jpg (YaCy profile selector)
|
||||
[6]: https://en.wikipedia.org/wiki/Vivisimo
|
||||
[7]: https://opensource.com/sites/default/files/uploads/yacy-network-config.jpg (YaCy network configuration)
|
||||
[8]: https://opensource.com/sites/default/files/uploads/yacy-advanced-crawler.jpg (YaCy advanced crawler)
|
||||
[9]: https://opensource.com/sites/default/files/uploads/yacy-remote-crawl-accept.jpg (YaCy remote crawling)
|
||||
[10]: https://opensource.com/sites/default/files/uploads/yacy-monitor.jpg (YaCy monitoring screen)
|
@ -0,0 +1,99 @@
|
||||
[#]: collector: (lujun9972)
|
||||
[#]: translator: (HankChow)
|
||||
[#]: reviewer: ( )
|
||||
[#]: publisher: ( )
|
||||
[#]: url: ( )
|
||||
[#]: subject: (Top hacks for the YaCy open source search engine)
|
||||
[#]: via: (https://opensource.com/article/20/2/yacy-search-engine-hacks)
|
||||
[#]: author: (Seth Kenlon https://opensource.com/users/seth)
|
||||
|
||||
使用开源搜索引擎 YaCy 的技巧
|
||||
======
|
||||
> 不想再受制于各种版本的搜索引擎?使用 YaCy 自定义一款吧。
|
||||
![Browser of things][1]
|
||||
|
||||
在我以前介绍 [YaCy 入门][2]的文章中讲述过 [YaCy][3] 这个<ruby>点对点<rt>peer-to-peer</rt></ruby>式的搜索引擎是如何安装和使用的。YaCy 最大的一个特点就是可以在本地部署,全球范围内的每一个 YaCy 用户都是构成整个分布式搜索引擎架构的其中一个节点,因此每个用户都可以掌控自己的互联网搜索体验。
|
||||
|
||||
Google 曾经提供过 `google.com/linux` 这样的简便方式以便快速筛选出和 Linux 相关的搜索内容,这个功能受到了很多人的青睐,但 Google 最终还是在 2011 年的时候把它[下线][4]了。
|
||||
|
||||
而 YaCy 则让自定义搜索引擎变得可能。
|
||||
|
||||
### 自定义 YaCy
|
||||
|
||||
YaCy 安装好之后,只需要访问 `localhost:8090` 就可以使用了。要开始自定义搜索引擎,只需要点击右上角的“<ruby>管理<rt>Administration</rt></ruby>”按钮,如果没有找到,需要点击菜单图标打开菜单。
|
||||
|
||||
你可以在管理面板中配置 YaCy 对系统资源的使用策略,以及如何跟其它的 YaCy 客户端进行交互。
|
||||
|
||||
![YaCy profile selector][5]
|
||||
|
||||
例如,点击侧栏中的“<ruby>初步<rt>First steps</rt></ruby>”按钮可以配置备用端口,以及设置 YaCy 对内存和硬盘的使用量;而“<ruby>监控<rt>Monitoring</rt></ruby>”面板则可以监控 YaCy 的运行状况。大多数功能都只需要在面板上点击几下就可以完成了,例如以下几个常用的功能。
|
||||
|
||||
### 搜索应用
|
||||
|
||||
目前市面上也有不少公司推出了[内网搜索应用][6],而 YaCy 的优势是免费使用。对于能够通过 HTTP、FTP、Samba 等协议访问的文件,YaCy 都可以进行索引,因此无论是作为私人的文件搜索还是企业内部的本地共享文件搜索,YaCy 都可以实现。它可以让内部网络中的用户使用自定义配置的 YaCy 查找共享文件,于此同时保持对内部网络以外的用户不可见。
|
||||
|
||||
### 网络配置
|
||||
|
||||
YaCy 在默认情况下就对隐私隔离有比较好的支持。点击“<ruby>用例与账号<rt>Use Case & Account</rt></ruby>”页面顶部的“<ruby>网络配置<rt>Network Configuration</rt></ruby>”链接,即可进入网络配置面板设置点对点网络。
|
||||
|
||||
![YaCy network configuration][7]
|
||||
|
||||
### 爬取站点
|
||||
|
||||
YaCy 点对点的分布式运作方式决定了它对页面的爬取是由用户驱动的。任何一个公司的爬虫都不可能完全访问到整个互联网上的所有页面,对于 YaCy 来说也是这样,一个站点只有在被用户指定爬取的前提下,才会被 YaCy 爬取并进入索引。
|
||||
|
||||
YaCy 客户端提供了两种爬取页面的方式:一是自定义爬虫,二是使用 YaCy 推荐的爬虫。
|
||||
|
||||
![YaCy advanced crawler][8]
|
||||
|
||||
#### 自定义爬虫任务
|
||||
|
||||
自定义爬虫是指由用户输入指定的网站 URL 并启动 YaCy 的爬虫任务。只需要点击“<ruby>高级爬虫<rt>Advanced Crawler</rt></ruby>”并输入计划爬取的 URL,然后选择页面底部的“<ruby>进行远程索引<rt>Do Remote indexing</rt></ruby>”选项,这个选项会让客户端将上面输入的 URL 向互联网广播,接收到广播的其它远程客户端就会开始爬取这些 URL 所指向的页面。
|
||||
|
||||
点击页面底部的“<ruby>开始新爬虫任务<rt>Start New Crawl Job</rt></ruby>”按钮就可以开始进行爬取了,我就是这样对一些常用和有用站点进行爬取和索引的。
|
||||
|
||||
爬虫任务启动之后,YaCy 会将这些 URL 对应的页面在本地生成和存储索引。在高级模式下,也就是本地计算机允许 8090 端口流量进出时,全网的 YaCy 用户都可以使用到这一份索引。
|
||||
|
||||
#### 加入爬虫任务
|
||||
|
||||
尽管 YaCy 用户已经在互联网上爬取了很多页面,但对于全网浩如烟海的页面而言也只是沧海一粟。单个用户所拥有的资源远不及很多大公司的网络爬虫,但大量 YaCy 用户如果联合起来成为一个社区,能产生的力量就大得多了。只要开启了 YaCy 的爬虫请求广播功能,就可以让其它客户端参与进来爬取更多页面。
|
||||
|
||||
只需要在“<ruby>高级爬虫<rt>Advanced Crawler</rt></ruby>”面板中点击页面顶部的“<ruby>远程爬取<rt>Remote Crawling</rt></ruby>”,勾选“<ruby>加载<rt>Load</rt></ruby>”复选框,就可以让你的客户端接受其它人发来的爬虫任务请求了。
|
||||
|
||||
![YaCy remote crawling][9]
|
||||
|
||||
### YaCy 监控相关
|
||||
|
||||
YaCy 除了作为一个非常强大的搜索引擎,还提供了很丰富的用户体验。你可以在“<ruby>监控<rt>Monitor</rt></ruby>”面板中监控 YaCy 客户端的网络运行状况,甚至还可以了解到有多少人从 YaCy 社区中获取到了自己所需要的东西。
|
||||
|
||||
![YaCy monitoring screen][10]
|
||||
|
||||
### 搜索引擎发挥了作用
|
||||
|
||||
你使用 YaCy 的时间越长,就越会思考搜索引擎如何改变自己的视野,因为你对互联网的体验很大一部分来自于你在搜索引擎中一次次简单查询的结果。实际上,当你和不同行业的人交流时,可能会注意到每个人对“互联网”的理解都有所不同。有些人会认为,互联网的搜索引擎中充斥着各种广告和推广,同时也仅仅能从搜索结果中获取到有限的信息。例如,假设有人不断搜索关于关键词 X 的内容,那么大部分商业搜索引擎都会在搜索结果中提高关键词 X 的权重,但与此同时,另一个关键词 Y 的权重则会相对降低,从而让关键词 Y 被淹没在搜索结果当中。
|
||||
|
||||
就像在现实生活中一样,走出舒适圈会让你看到一个更广阔的世界。尝试使用 YaCy,看看你会不会有所收获。
|
||||
|
||||
--------------------------------------------------------------------------------
|
||||
|
||||
via: https://opensource.com/article/20/2/yacy-search-engine-hacks
|
||||
|
||||
作者:[Seth Kenlon][a]
|
||||
选题:[lujun9972][b]
|
||||
译者:[HankChow](https://github.com/HankChow)
|
||||
校对:[校对者ID](https://github.com/校对者ID)
|
||||
|
||||
本文由 [LCTT](https://github.com/LCTT/TranslateProject) 原创编译,[Linux中国](https://linux.cn/) 荣誉推出
|
||||
|
||||
[a]: https://opensource.com/users/seth
|
||||
[b]: https://github.com/lujun9972
|
||||
[1]: https://opensource.com/sites/default/files/styles/image-full-size/public/lead-images/browser_desktop_website_checklist_metrics.png?itok=OKKbl1UR (Browser of things)
|
||||
[2]: https://opensource.com/article/20/2/open-source-search-engine
|
||||
[3]: https://yacy.net/
|
||||
[4]: https://www.linuxquestions.org/questions/linux-news-59/is-there-no-more-linux-google-884306/
|
||||
[5]: https://opensource.com/sites/default/files/uploads/yacy-profiles.jpg (YaCy profile selector)
|
||||
[6]: https://en.wikipedia.org/wiki/Vivisimo
|
||||
[7]: https://opensource.com/sites/default/files/uploads/yacy-network-config.jpg (YaCy network configuration)
|
||||
[8]: https://opensource.com/sites/default/files/uploads/yacy-advanced-crawler.jpg (YaCy advanced crawler)
|
||||
[9]: https://opensource.com/sites/default/files/uploads/yacy-remote-crawl-accept.jpg (YaCy remote crawling)
|
||||
[10]: https://opensource.com/sites/default/files/uploads/yacy-monitor.jpg (YaCy monitoring screen)
|
Loading…
Reference in New Issue
Block a user