用Google自定义搜索引擎(CSE)搜索del.icio.us

花了两天时间,搭建起了这个del.icio.us自定义搜索引擎,可以在自己del.icio.us书签的某个tag的站点中进行搜索。比如直接在job的tag下去搜已经收集好的51job/chinahr等站点。

需要输入del.icio.us的用户名和密码。我只能口头保证不窥探保存你的密码,使用前还请谨慎。因为不保存用户名和密码信息,所以cse的站点定义不会随着del.icio.us的书签更新而更新。需要更新的话,不妨再次访问这个程序。

调用了del.icio.us的api,但是不能频繁访问,否则就会摔过来503。

zooie早就做过相同的工作,提供的功能也更复杂。我的这个程序使用了google cse比较新的Linked CSEs,一来不必操作annotations xml文件;二来可以生成一段代码嵌到你需要的页面中。

相关的一些文档和链接

搭建了一个免费电子书下载搜索引擎

用Google自定义搜索搭建了一个免费电子书下载搜索,把以前常用的几个站点都放上去了。包括csdn下载频道这类非法电子书交换市场;也包括RapidShare这样的文件存储网站;还有一些大大小小的电子书共享网站。以后随时把自己用到的类似站点这样做垂直分类。

实际上早有人开始做这样的事了,而且更敬业一些。对自定义搜索引擎中搜索一下就可以找到他们了。

coopfreeebookdownload.png

解决通过图书馆帐号下载ACS paper pdf的问题

通过图书馆帐号下载ACS paper的PDF文件,一直都存在问题,水木上也有过讨论[1][2][3]。我通过电子资源校外访问控制系统用ACS,倒是没有“出国开放访问”的问题,但仍然无法正常下载PDF。

一个正常的PDF下载过程大致是

1. Abstract page
http://infosource.lib.tsinghua.edu.cn:8080/~/WACS/pubs.acs.org/cgi-bin/abstract.cgi/jcisd8/asap/abs/ci7002472.html
2. PDF link
http://infosource.lib.tsinghua.edu.cn:8080/~/WACS/pubs.acs.org/cgi-bin/asap.cgi/jcisd8/asap/pdf/ci7002472.pdf
3. PDF link with session id, 真正的下载链接
http://infosource.lib.tsinghua.edu.cn:8080/~/WACS/pubs.acs.org/cgi-bin/asap.cgi/jcisd8/asap/pdf/ci7002472.pdf?sessid=3771

问题出在第二个页面上。第二个页面原本应该自动跳转到第三个链接,实际上每次都会自动跳转到

http://infosource.lib.tsinghua.edu.cn:8080/cgi-bin/asap.cgi/jcisd8/asap/pdf/ci7002472.pdf?sessid=3771

原因是第二个页面有下面这样的script,会让这种web proxy式的访问出现问题。

dest="/cgi-bin/asap.cgi/jcisd8/asap/pdf/ci700120v.pdf?sessid=8836
window.location.replace(location.protocol + '//' + location.host + dest);

跟踪页面访问的过程,查看各个访问中的源代码,使用的是Firefox的插件Tamper Data。

所以,解决的办法很简单,“禁止script”就可以了。在Maxthon中设置比较容易。只不过出现下面的链接后,需要点一下。

The page you have requested is loading. Click here if your browser does not automatically redirect you.

好几个月了,这点麻烦花废了自己很多时间,一定要记下来。

Setup Web.py and flup on Windows+Apache(WAMP)

  • Web.py is installed by “Easy Install” as Python eggs.
  • To install “Easy Install”(setuptools), download ez_setup.py, run it, setuptools egg for right Python version is installed automatically.
  • Run “ez_setup.py web.py”, web.py is installed automatically.
  • web.py implements WSGI. Need to install flup to provide WSGI interfaces for web.py and web server as CGI, FastCGI or SCGI.
  • Config Apache to allow CGI on web directory. Most common problem is permission setting, but its fare easy on Windows.
  • Problem exists with flup to run under such environment. Blow logs could be found in Apache err log

[Thu Sep 27 09:21:34 2007] [error] [client 127.0.0.1] File “D:\Python25\lib\site-packages\flup-1.0-py2.5.egg\flup\server\fcgi_base.py”, line 976, in _setupSocketr
[Thu Sep 27 09:21:34 2007] [error] [client 127.0.0.1] AttributeError: ‘module’ object has no attribute ‘fromfd’r
[Thu Sep 27 09:21:34 2007] [error] [client 127.0.0.1] Unhandled exception in thread started by r
[Thu Sep 27 09:21:34 2007] [error] [client 127.0.0.1] Error in sys.excepthook:r

Solved referencing to http://groups.google.com/group/webpy/browse_thread/thread/67a8cfa5fdb1882b/722acab404514de4
Modification to flup code and repack is needed.

Down load flup code (but by ez_setup).
Modify fcgi_base.py as:

    def _setupSocket(self):
        if self._bindAddress is None: # Run as a normal FastCGI?
            isFCGI = True

            #@ commented by charlie, ref: http://groups.google.com/group/webpy/browse_thread/thread/67a8cfa5fdb1882b/722acab404514de4
##             sock = socket.fromfd(FCGI_LISTENSOCK_FILENO, socket.AF_INET,
##                                  socket.SOCK_STREAM)
##             try:
##                 sock.getpeername()
##             except socket.error, e:
##                 if e[0] == errno.ENOTSOCK:
##                     # Not a socket, assume CGI context.
##                     isFCGI = False
##                 elif e[0] != errno.ENOTCONN:
##                     raise
            isFCGI = False

            # FastCGI/CGI discrimination is broken on Mac OS X.

Rebuild eggs for flup following the easy instructions, run “setup.py bdist_egg“.
Copy the new egg to site-packages directory.

My things done.

Difference between Redirect and CNAME

I am no more confused by the differences between name alias with CName record and Redirect settings provided by web hostings.

CName

Setting

Set a CName DNS record to add alias google.charliezhu.com to www.google.com

ip of google.charliezhu.com

E:\Documents and Settings\Charliezhu>ping google.charliezhu.com
Pinging www.l.google.com [64.233.189.104] with 32 bytes of data:

ip of google.com

E:\Documents and Settings\Charliezhu>ping google.com
Pinging www.l.google.com [64.233.189.104] with 32 bytes of data:

The HTTP headers

GET / HTTP/1.1
Host: google.charliezhu.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 200 OK
Cache-Control: private
Content-Type: text/html
Set-Cookie: PREF=ID=18f0202fad0f0342:NW=1:TM=1166005857:LM=1166005857:S=XPXvoWJWL4FJ2qOF; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com
Content-Encoding: gzip
Server: GWS/2.1
Content-Length: 1430
Date: Wed, 13 Dec 2006 10:30:57 GMT

Packets captured

The DNS server handles name resolve for the alias and answers proper CNAME class DNS responses.

dns.cname

Browser address bar

Browser never knows the address is an alias.

dns.cname.address.bar

Redirect

Setting

Add a sub-domain google-re.charliezhu.com as Redirect to http://www.google.com.

The HTTP Headers

The HTTP server echoes a response that tell the browser to redirect to another URL. That means the setting such as “redirect”, “auto forward” and so on are all handled by the Web servers but the DNS servers. They response to browser and expect the browser to make another request to the target host. That is totally ANOTHER web page request process as every time we type address in the address bar then go.

Some hosting as domainbank.com provided a URL forwarding service. It is implemented by HTTP server too and just responses a web page with frame contents the target URL you specified.

GET / HTTP/1.1
Host: google-re.charliezhu.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0
… …

HTTP/1.x 301 Moved Permanently
Date: Wed, 13 Dec 2006 11:12:57 GMT
Server: Apache/2.0.54 (Unix) PHP/4.4.2 mod_ssl/2.0.54 OpenSSL/0.9.7e mod_fastcgi/2.4.2 DAV/2 SVN/1.1.4
Location: http://www.google.com/
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 190
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: text/html; charset=iso-8859-1
———————————————————-

GET / HTTP/1.1
Host: www.google.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0
… …

HTTP/1.x 200 OK
Cache-Control: private
Content-Type: text/html
Content-Encoding: gzip
Server: GWS/2.1
Content-Length: 1977
Date: Wed, 13 Dec 2006 11:12:56 GMT

related post: 域名的基本概念

域名的基本概念

服务提供

注册、使用一个域名,就是域名服务的获得过程,应该分成两部分看。一是域名注册,二是域名(解析)服务。

注册,管理

域名注册确定了你对域名的拥有权。注册商(registrar)在ICANN获得授权,你在注册商那里注册、续费。

A registrar is an organization that has been accredited by ICANN to register domain names. Yahoo!’s domain registration partner, Melbourne IT, is the registrar for all new domains registered through Yahoo!

域名服务器

域名服务器,为你的域名提供解析。包括的功能很多,比如子域名解析、MX段解析等。

A name server, DNS server is similar to a telephone switchboard. It holds the information that tells the Internet where to find your web site and where to deliver your email. Name servers look something like this: yns1.yahoo.com. You can change your name servers in the Advanced DNS area of your Domain Control Panel; however, Yahoo!’s name servers must be listed as the primary and secondary name servers in order for Yahoo! to host your domain services properly.

www.internic.net/whois.html 查询,可以确定域名的DNS Server。

域名转移(registrar transfer) 和 更改域名指向(Domain Forwarding)不同

A registrar transfer is the movement of a domain name’s records and management from one registrar to another. When you transfer your domain to a different registrar, the new registrar becomes responsible for maintaining your domain registration records, managing your domain renewals, and other administrative details.

购买虚拟主机时(比如StartLogic),只要point the dns servers to StartLogic from your current registrar就可以了。这一般不收费。
registrar一般都提供Web形式的操作面板,可以用来设置DNS server。最次也可以打电话要求手动更改。

DNS的一些基础知识

A host name is an Internet address or domain name with a prefix. For example, a host name of the domain name yourdomainname.com may be "example.yourdomainname.com."
An A (address) record is a DNS record that can be used to point your domain name and host names to a static IP address.
The DNS (Domain Name System) records control the functionality of domain names. Each registered domain name has a DNS record that includes MX, A, and CNAME records and name servers.

每一条DNS记录包含MX, A, CNAME 或名称服务器地址等记录。记录的内容是主机名到地址的映射关系。

DNS是有缓存的。 1)访问者的电脑;2)你的ISP接入商。

ref 1 , ref 2

Random posts

  • 有人扒我的网站
  • 又见renren.com
  • “一年到头”结尾部分的视频
  • 结构式图片生成服务, DayLight SMI2GIF
  • 李哥的婚礼