Python-izm 基礎編を Lambda で試してみた

Python-izmなる勉強になるサイトを見つけたため、AWS Lambdaで試してみた。

基礎編

文字列

def lambda_handler(event, context):
	    testnum = "123"
	    testnum = testnum + "456"
	    testnum	= testnum + "789"
	    testnum += "0"
	    print(testnum)
	    print(testnum.replace('1234567890','0123456789'))
	    print(testnum.rjust(20,'0'))
	    print(testnum.zfill(20))
	    
	    teststr = 'this-message'
	    print(teststr)
	    print(teststr.startswith('this'))
	    print(teststr.split('-'))
==========================================================================
Response:
null

Request ID:
"ebe00f2f-3fda-11e8-b0be-112860d9c610"

Function Logs:
START RequestId: ebe00f2f-3fda-11e8-b0be-112860d9c610 Version: $LATEST
1234567890
0123456789
00000000001234567890
00000000001234567890
this-message
True
['this', 'message']
END RequestId: ebe00f2f-3fda-11e8-b0be-112860d9c610
REPORT RequestId: ebe00f2f-3fda-11e8-b0be-112860d9c610	Duration: 0.83 ms	Billed Duration: 100 ms 	Memory Size: 128 MB	Max Memory Used: 21 MB

数値

def lambda_handler(event, context):
		testint = 100.5
		print(float(testint) + 100)
		
		testcomp = 100 + 5j
		print(testcomp.real)
		print(testcomp.imag)
==========================================================================
Response:
null

Request ID:
"98ecdea0-3fde-11e8-84a6-6331306e40e3"

Function Logs:
START RequestId: 98ecdea0-3fde-11e8-84a6-6331306e40e3 Version: $LATEST
200.5
100.0
5.0
END RequestId: 98ecdea0-3fde-11e8-84a6-6331306e40e3
REPORT RequestId: 98ecdea0-3fde-11e8-84a6-6331306e40e3	Duration: 0.42 ms	Billed Duration: 100 ms 	Memory Size: 128 MB	Max Memory Used: 21 MB

日付・時間

import datetime
import calendar

def lambda_handler(event, context):
		today = datetime.date.today()
		todaydetail = datetime.datetime.today()
		print(today)
		print(todaydetail)
		print(todaydetail + datetime.timedelta(hours=9))
		print(todaydetail.year)
		print(todaydetail.strftime("%Y/%m/%d %H:%M:%S"))
		print(calendar.isleap(2018))
==========================================================================
Response:
null

Request ID:
"cb3365b8-3fdf-11e8-9ef3-f786a1acc5e8"

Function Logs:
START RequestId: cb3365b8-3fdf-11e8-9ef3-f786a1acc5e8 Version: $LATEST
2018-04-14
2018-04-14 12:31:45.698769
2018-04-14 21:31:45.698769
2018
2018/04/14 12:31:45
False
END RequestId: cb3365b8-3fdf-11e8-9ef3-f786a1acc5e8
REPORT RequestId: cb3365b8-3fdf-11e8-9ef3-f786a1acc5e8	Duration: 9.95 ms	Billed Duration: 100 ms 	Memory Size: 128 MB	Max Memory Used: 21 MB

タプル

import datetime

def lambda_handler(event, context):
		test_tuple = get_today()
		print(test_tuple)
		print(test_tuple[0])

def get_today():
		today = datetime.datetime.today()
		value = (today.year, today.month, today.day)
		return value
==========================================================================
Response:
null

Request ID:
"2e249bf9-3fe2-11e8-ac44-514731084dd0"

Function Logs:
START RequestId: 2e249bf9-3fe2-11e8-ac44-514731084dd0 Version: $LATEST
(2018, 4, 14)
2018
END RequestId: 2e249bf9-3fe2-11e8-ac44-514731084dd0
REPORT RequestId: 2e249bf9-3fe2-11e8-ac44-514731084dd0	Duration: 28.05 ms	Billed Duration: 100 ms 	Memory Size: 128 MB	Max Memory Used: 21 MB

リスト


def lambda_handler(event, context):
		test_list = ['this','-','is','-','test']
		print(test_list)
		
		for i in test_list:
			print(i)
		
		test_list.append('.')
		print(test_list)
		print(test_list.index('test'))
==========================================================================
Response:
null

Request ID:
"126169e5-3fe3-11e8-8df7-4d4c2b06e81b"

Function Logs:
START RequestId: 126169e5-3fe3-11e8-8df7-4d4c2b06e81b Version: $LATEST
['this', '-', 'is', '-', 'test']
this
-
is
-
test
['this', '-', 'is', '-', 'test', '.']
4
END RequestId: 126169e5-3fe3-11e8-8df7-4d4c2b06e81b
REPORT RequestId: 126169e5-3fe3-11e8-8df7-4d4c2b06e81b	Duration: 0.48 ms	Billed Duration: 100 ms 	Memory Size: 128 MB	Max Memory Used: 21 MB

ディクショナリ


def lambda_handler(event, context):
		test_dict = {'year':'2018','mon':'04','day':'14'}
		for i in test_dict:
			print(i)
			print(test_dict[i])
		
		print(test_dict.get('year','NOT FOUND'))
		print(test_dict.get('years','NOT FOUND'))
		
		print(test_dict.keys())
		print(test_dict.values())
		
		for key,value in test_dict.items():
			print(key,":",value)
==========================================================================
Response:
null

Request ID:
"be57689a-3fe5-11e8-a508-7926ec266421"

Function Logs:
START RequestId: be57689a-3fe5-11e8-a508-7926ec266421 Version: $LATEST
year
2018
mon
04
day
14
2018
NOT FOUND
dict_keys(['year', 'mon', 'day'])
dict_values(['2018', '04', '14'])
year : 2018
mon : 04
day : 14
END RequestId: be57689a-3fe5-11e8-a508-7926ec266421
REPORT RequestId: be57689a-3fe5-11e8-a508-7926ec266421	Duration: 0.58 ms	Billed Duration: 100 ms 	Memory Size: 128 MB	Max Memory Used: 21 MB

セット


def lambda_handler(event, context):
		test_set = {'this','-','is','-','set'}
		print(test_set)
		
		for i in test_set:
			print(i)
		
		test_set.discard('-')
		print(test_set)
		
		test_set_frozon = frozenset({'can','not','add','and','discard'})
		print(test_set_frozon)
==========================================================================
Response:
null

Request ID:
"84c22b12-3fe8-11e8-b96a-af1c8c10d84a"

Function Logs:
START RequestId: 84c22b12-3fe8-11e8-b96a-af1c8c10d84a Version: $LATEST
{'is', 'set', '-', 'this'}
is
set
-
this
{'is', 'set', 'this'}
frozenset({'discard', 'can', 'not', 'add', 'and'})
END RequestId: 84c22b12-3fe8-11e8-b96a-af1c8c10d84a
REPORT RequestId: 84c22b12-3fe8-11e8-b96a-af1c8c10d84a	Duration: 0.43 ms	Billed Duration: 100 ms 	Memory Size: 128 MB	Max Memory Used: 21 MB

スライス


def lambda_handler(event, context):
		test_list = ['this','is','python','slice','set']
		print(test_list[:])
		print(test_list[:4])
		print(test_list[1:])
		print(test_list[::2])
		print(test_list[-1::])
		print(test_list[::-1])
==========================================================================
Response:
null

Request ID:
"1467b80a-3ffc-11e8-8f0b-d9cd7250656b"

Function Logs:
START RequestId: 1467b80a-3ffc-11e8-8f0b-d9cd7250656b Version: $LATEST
['this', 'is', 'python', 'slice', 'set']
['this', 'is', 'python', 'slice']
['is', 'python', 'slice', 'set']
['this', 'python', 'set']
['set']
['set', 'slice', 'python', 'is', 'this']
END RequestId: 1467b80a-3ffc-11e8-8f0b-d9cd7250656b
REPORT RequestId: 1467b80a-3ffc-11e8-8f0b-d9cd7250656b	Duration: 0.40 ms	Billed Duration: 100 ms 	Memory Size: 128 MB	Max Memory Used: 21 MB

インポート

class TestClass:
    def __init__(self):
        print('Create TestClass')
        
    def test_method(self,val):
        print('call test_method')
        print(val)
==========================================================================
import testmod

def lambda_handler(event, context):
		test_class = testmod.TestClass()
		test_class.test_method('1')
==========================================================================
Response:
null

Request ID:
"da5fbad7-3ffe-11e8-99e4-f124716fdbac"

Function Logs:
START RequestId: da5fbad7-3ffe-11e8-99e4-f124716fdbac Version: $LATEST
Create TestClass
call test_method
1
END RequestId: da5fbad7-3ffe-11e8-99e4-f124716fdbac
REPORT RequestId: da5fbad7-3ffe-11e8-99e4-f124716fdbac	Duration: 0.34 ms	Billed Duration: 100 ms 	Memory Size: 128 MB	Max Memory Used: 21 MB

コマンドライン引数
Labmdaではeventを利用する。

def lambda_handler(event, context):
		print(event)
		print(event['msg'])
==========================================================================
Response:
null

Request ID:
"bd98380a-3fff-11e8-b8af-798937c861d6"

Function Logs:
START RequestId: bd98380a-3fff-11e8-b8af-798937c861d6 Version: $LATEST
{'msg': 'Hello World'}
Hello World
END RequestId: bd98380a-3fff-11e8-b8af-798937c861d6
REPORT RequestId: bd98380a-3fff-11e8-b8af-798937c861d6	Duration: 0.38 ms	Billed Duration: 100 ms 	Memory Size: 128 MB	Max Memory Used: 21 MB

パスの結合・連結

import os

def lambda_handler(event, context):
		directory = "/etc/python"
		file = "setting.ini"
		print(os.path.join(directory,file))
		print(os.path.join(directory,"3.6",file))
==========================================================================
Response:
null

Request ID:
"9176f4fd-4000-11e8-aa7d-f996117ce1ed"

Function Logs:
START RequestId: 9176f4fd-4000-11e8-aa7d-f996117ce1ed Version: $LATEST
/etc/python/setting.ini
/etc/python/3.6/setting.ini
END RequestId: 9176f4fd-4000-11e8-aa7d-f996117ce1ed
REPORT RequestId: 9176f4fd-4000-11e8-aa7d-f996117ce1ed	Duration: 0.35 ms	Billed Duration: 100 ms 	Memory Size: 128 MB	Max Memory Used: 21 MB

if文

def lambda_handler(event, context):
		value = 1
		if value == 1:
			print("value is 1")
		elif value == 2:
			print("value is 2")
		elif value == 3:
			pass
		else:
			print("value is not 1,2,3")
==========================================================================
Response:
null

Request ID:
"7c158508-4001-11e8-90fc-11102c6f7f98"

Function Logs:
START RequestId: 7c158508-4001-11e8-90fc-11102c6f7f98 Version: $LATEST
value is 1
END RequestId: 7c158508-4001-11e8-90fc-11102c6f7f98
REPORT RequestId: 7c158508-4001-11e8-90fc-11102c6f7f98	Duration: 0.39 ms	Billed Duration: 100 ms 	Memory Size: 128 MB	Max Memory Used: 21 MB

while文

def lambda_handler(event, context):
		counter = 0
		while counter < 10:
			counter += 1
			print(counter)
		
		# タイムアウトになるまで実行されてエラーメッセージが出力された
		while True:
			print("test")
==========================================================================
Response:
{
  "errorMessage": "2018-04-14T16:39:04.666Z 460c8d7e-4002-11e8-80d8-d3914a20416b Task timed out after 30.01 seconds"
}

Request ID:
"460c8d7e-4002-11e8-80d8-d3914a20416b"

Function Logs:

test
....

continue

def lambda_handler(event, context):
	for num in range(100):
		if num % 10:
			continue
		
		print(num)
==========================================================================
Response:
null

Request ID:
"1504b6c1-4003-11e8-a3ab-67ba43bfb4bb"

Function Logs:
START RequestId: 1504b6c1-4003-11e8-a3ab-67ba43bfb4bb Version: $LATEST
0
10
20
30
40
50
60
70
80
90
END RequestId: 1504b6c1-4003-11e8-a3ab-67ba43bfb4bb
REPORT RequestId: 1504b6c1-4003-11e8-a3ab-67ba43bfb4bb	Duration: 0.40 ms	Billed Duration: 100 ms 	Memory Size: 128 MB	Max Memory Used: 22 MB

例外処理

import sys
import traceback

def lambda_handler(event, context):
	
	def except_test(value1,value2):
		result = 0
		try:
			result = value1 + value2
		except:
			print("計算できません")
			raise
		finally:
			print("計算終了")
	
		return result
	
	print(except_test(200,100))
	
	try:
		print(except_test(200,'100'))
	except:
		print("Error")
==========================================================================
Response:
null

Request ID:
"deb4f5cb-4007-11e8-8bc4-fde8b1a9daf5"

Function Logs:
START RequestId: deb4f5cb-4007-11e8-8bc4-fde8b1a9daf5 Version: $LATEST
計算終了
300
計算できません
計算終了
Error
END RequestId: deb4f5cb-4007-11e8-8bc4-fde8b1a9daf5
REPORT RequestId: deb4f5cb-4007-11e8-8bc4-fde8b1a9daf5	Duration: 0.31 ms	Billed Duration: 100 ms 	Memory Size: 128 MB	Max Memory Used: 21 MB

TypeError: ‘module’ object is not callable

random関数を利用しようとしたらタイトルのエラー
・(random.pyというファイル名で作成していたので)ファイル名の変更
・シンボリックリンクの作成
でエラーが解消した

root@hostname:/home/shimizu/python# cat random.py
import random
if __name__ == "__main__":
    print random.random()
root@hostname:/home/shimizu/python# python random.py
Traceback (most recent call last):
  File "random.py", line 3, in <module>
    print random.random()
TypeError: 'module' object is not callable
root@hostname:/home/shimizu/python# mv random.py mkrandom.py
root@hostname:/home/shimizu/python# python mkrandom.py
Traceback (most recent call last):
  File "mkrandom.py", line 3, in <module>
    print random.random()
TypeError: 'module' object is not callable
root@hostname:/home/shimizu/python# ln -s /usr/lib/python2.7/random.py ./
root@hostname:/home/shimizu/python# python mkrandom.py
0.228268881195

参考URL

help with python urllib2 import error
http://stackoverflow.com/questions/1933928/help-with-python-urllib2-import-error#comment1842908_1933953

python

概要

ドットインストールでお勉強したことをメモ
2015-08-17_225517
http://dotinstall.com/lessons/basic_python_v2

pythonとは

シンプルで取得しやすいオブジェクト指向言語

Hello World

root@hostname:/home/shimizu/python# python --version
Python 2.7.9

root@hostname:/home/shimizu/python# cat hello.py
# coding: UTF-8
# ↑がないと日本語利用時にエラーとなる
print "hello world"

root@hostname:/home/shimizu/python# python hello.py
hello world

変数の型など

root@hostname:/home/shimizu/python# cat hensu.py
# coding: UTF-8
# 変数は大文字小文字を区別する
msg = "hello world"
print msg

# 日本語はuをつける、つけないと文字列操作ができないらしい
msg2 = u"こんにちは"
print msg2

# 改行が必要な場合は """ で加工
print """
this
is
pen
"""
# 演算子 + - * / // % **
# 整数と小数の演算 → 答えは小数となる
# 割り算 → 答えは整数
print 10 * 5
print 10 // 4
print 10 // 4 # 余りを求める
print 2 ** 3
print 10 / 4

# リスト
sales = [25, 50, 70]
print len(sales)
sales[2] = 100
print sales[2]
print range(10)

# リスト操作
sales.sort()
sales.reverse()
print sales

d = "2015/8/15"
print d.split("/")
ds = d.split("/")
print "-".join(ds)

# タプル(変更ができない) "()"で囲む
# 計算時に高速に処理できたり、ミスが少なくなる
# 変更しようとすると以下のエラー
# TypeError: 'tuple' object does not support item assignment
print "tuple"
a = (2,4,8)

# セット(集合型) - 重複を許さない
a = set([1,2,3,4,5,6])
print a

# 辞書型データ key value
fruit = {"apple":100, "banana":200, "pearch":600}
print fruit
print fruit["apple"]
print "banana" in fruit # True or False
print fruit.keys()
print fruit.values()
print fruit.items()

# 組み込み
a = 10
print "age: %d" % a
# オプションも利用可能
print "age: %10d" % a
print "age: %010d" % a

root@hostname:/home/shimizu/python# python hensu.py
hello world
こんにちは

this
is
pen

50
2
2
8
2
3
100
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[100, 50, 25]
['2015', '8', '15']
2015-8-15
tuple
set([1, 2, 3, 4, 5, 6])
{'pearch': 600, 'apple': 100, 'banana': 200}
100
True
['pearch', 'apple', 'banana']
[600, 100, 200]
[('pearch', 600), ('apple', 100), ('banana', 200)]
age: 10
age:         10
age: 0000000010

型の変換に注意

root@hostname:/home/shimizu/python# cat henkan.py
# coding: UTF-8
# 数値 int float
# 文字列 str
print 5 + int("5")
age = 20
print "i am " + str(age) + " years old"

root@hostname:/home/shimizu/python# python henkan.py
10
i am 20 years old

文字列操作

root@hostname:/home/shimizu/python# cat mojisousa.py
# coding: UTF-8
word = "abcdefghijklm"
print len(word)
print word.find("b")
# 2番目から5番目を表示する
print word[2:5]

root@hostname:/home/shimizu/python# python mojisousa.py
13
1
cde

if文

root@hostname:/home/shimizu/python# cat if.py
# coding: UTF-8
score = 50
# if 以下を字下げする
if score > 60:
    print "OK!"
elif score > 40:
    print "SOSO.."
else:
    print "NG!"

root@hostname:/home/shimizu/python# python if.py
SOSO..

ループ文

root@hostname:/home/shimizu/python# cat loop.py
# coding: UTF-8

fruits = ["apple","banana","orange"]
for fruit in fruits:
    print fruit
# for文が終わったら実行する
else:
    print "Finish"

for i in range(4):
    if i ==2:
        continue
    print "number"
    print i
print "Finish"

vegis = {"cabbage":200,"carrot":100,"radish":150}
for key,value in vegis.iteritems():
        print "key:%s value:%d" % (key,value)

for key in vegis.iterkeys():
     print key
for value in vegis.itervalues():
     print value

# while文
print "while"
n = 0
while n < 10:
    print n
    n += 1
else:
    print "Finish"

root@hostname:/home/shimizu/python# python loop.py
apple
banana
orange
Finish
number
0
number
1
number
3
Finish
key:radish value:150
key:cabbage value:200
key:carrot value:100
radish
cabbage
carrot
150
200
100
while
0
1
2
3
4
5
6
7
8
9
Finish

日付操作

root@hostname:/home/shimizu/python# cat datetime.py
# -*- coding: utf-8 -*-
import datetime

if __name__ == "__main__":
    today = datetime.date.today()
    todaydetail = datetime.datetime.today()

    print today
    print todaydetail

    print todaydetail + datetime.timedelta(days=1)

    newyear = datetime.datetime(2016,1,1)
    print newyear

    lastday = newyear - todaydetail
    print lastday.days
root@hostname:/home/shimizu/python# python datetime.py
2015-08-19
2015-08-19 23:07:22.489915
2015-08-20 23:07:22.489915
2016-01-01 00:00:00
134

モジュールの利用

root@hostname:/home/shimizu/python# cat module.py
# coding: UTF-8
import math, random
# 切り上げする
print math.ceil(5.2)
print random.random()

root@hostname:/home/shimizu/python# python module.py
6.0
0.655057714917

関数

root@hostname:/home/shimizu/python# cat function.py
# coding: UTF-8
def hello(name):
    print "hello %s" % name

hello("John")
hello("Agrini")

root@hostname:/home/shimizu/python# python function.py
hello John
hello Agrini

オブジェクト

root@hostname:/home/shimizu/python# cat object.py
# coding: UTF-8
# クラス:オブジェクトの設計図
# インスタンス:クラスを実体化したもの
class User(object):
     def __init__(self,name):
         self.name = name
     def greet(self):
         print "my name is %s!" % self.name

bob = User("Bob")
print bob.name
bob.greet()

# 継承
class SuperUser(User):
    def shout(self):
        print "%s is SUPER!!" % self.name

tom = SuperUser("Tom")
tom.greet()
tom.shout()

root@hostname:/home/shimizu/python# python object.py
Bob
my name is Bob!
my name is Tom!
Tom is SUPER!!

Beautiful Soup でスクレイピング

Pythonのライブラリ(Beautiful Soup)を利用してスクレイピングしてみた

インストールとタグ操作

root@hostname:/home/shimizu/python# aptitude install python-bs4
以下の新規パッケージがインストールされます:
  python-bs4 python-chardet{a} python-lxml{a}
...

root@hostname:/home/shimizu/python# cat scraping-bs4.py
# coding: UTF-8
import urllib2
from bs4 import BeautifulSoup
res = urllib2.urlopen("http://ll.jus.or.jp/2014/program.html")
# オブジェクト<class 'bs4.element.ResultSet'>を取得し、listのように扱える
soup = BeautifulSoup(res.read())


# 先頭のタグを表示する
print soup.find('a')
# タグ内の属性を表示する
print soup.a.get('href')
# タグの中の文字を取得する
print soup.a.string

root@hostname:/home/shimizu/python# python scraping-bs4.py
<a href="index.html" rel="home">LL Diver</a>
index.html
LL Diver

すべてのaタグの要素を表示する

root@hostname:/home/shimizu/python# cat scraping-bs4-2.py
# coding: UTF-8
import urllib2,sys
reload(sys)
sys.setdefaultencoding('utf-8')

from bs4 import BeautifulSoup
res = urllib2.urlopen("http://ll.jus.or.jp/2014/program.html")
# オブジェクト<class 'bs4.element.ResultSet'>を取得し、listのように扱える
soup = BeautifulSoup(res.read())

# リンク抽出
for link in soup.find_all('a'):
    if link.string is not None:
        print link.string + ": " + link.get('href')

root@hostname:/home/shimizu/python# python scraping-bs4-2.py
LL Diver: index.html
検索: index.html%3Fp=49.html#search-container
コンテンツへ移動: index.html%3Fp=49.html#content
LL Diver: index.html
LL Diver開催案内: index.html%3Fp=9.html
プログラム: index.html%3Fp=49.html
タイムテーブル: index.html%3Fp=198.html
アンケート: index.html%3Fp=260.html
アーカイブ: index.html%3Fp=77.html
昼の部: index.html%3Fp=49.html#day
...

その他操作

root@hostname:/home/shimizu/python# python
Python 2.7.9 (default, Mar  1 2015, 12:57:24)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib2,sys
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding('utf-8')
>>> from bs4 import BeautifulSoup
>>> res = urllib2.urlopen("http://ll.jus.or.jp/2014/program.html")
>>> soup = BeautifulSoup(res.read())
>>> print soup.title
<title>プログラム | LL Diver</title>
>>> print soup.title.name
title
>>> print soup.title.string
プログラム | LL Diver
>>> soup.p
<p>LL Diverのプログラムは、<a href="index.html%3Fp=49.html#day">昼の部</a>・<a href="index.html%3Fp=49.html#night">夜の部</a>に分けて行います。<br/>
<a href="index.html%3Fp=198.html" target="_blank" title="タイムテーブル">タイムテーブル</a>もあわせてご覧ください。</p>
>>> print soup.find(id="genericons-css")
<link href="wp-content/themes/twentyfourteen/genericons/genericons.css%3Fver=3.0.2.css" id="genericons-css" media="all" rel="stylesheet" type="text/css"/>
### タグを除去して全テキストを抽出 ###
>>> print(soup.get_text())
...
### soupについて ###
>>> print type(soup)
<class 'bs4.BeautifulSoup'>
>>> print soup.prettify
...

### Tag obj ###
>>> type(soup)
<class 'bs4.BeautifulSoup'>
>>> type(tag)
<class 'bs4.element.Tag'>
>>> tag.name
'a'
### 辞書にアクセスする ###
>>> tag.attrs
{'href': 'index.html', 'rel': ['home']}

### classは予約後のため class_ というキーワード引数でCSSのクラスを検索 ###
>>> soup.find_all(class_="page_item page-item-9")
[<li class="page_item page-item-9"><a href="index.html%3Fp=9.html">LL Diver開催案内</a></li>]

実際にデータを取得する-天気予報データ-

Python で簡単なテキスト処理 (3) – Beautiful Soup を使ってスクレイピング
http://jutememo.blogspot.jp/2008/06/python-3-beautiful-soup.html
より拝借

root@hostname:/home/shimizu/python# cat scraping-bs4-3.py
# coding: UTF-8
import urllib2,sys
reload(sys)
sys.setdefaultencoding('utf-8')

from bs4 import BeautifulSoup
URL = "http://www.data.jma.go.jp/obd/stats/etrn/view/daily_s1.php?prec_no=44&prec_ch=%93%8C%8B%9E%93s&block_no=47662&block_ch=%93%8C%8B%9E&year=2008&month=01&day=12&view=p1"

# 出力する日付
DATE_FROM = 1
DATE_TO = 5

# データ取得
res = urllib2.urlopen(URL)
soup = BeautifulSoup(res.read())

#print soup.prettify
#print type(soup)

records = []
# class 属性が mtx である tr タグを対象に
for tr in soup('tr',{'class':'mtx'}):
    rec = []
    # id 属性がp_print のdivタグを対象に
    for div in tr('div',{'class':'a_print'}):
        # a タグを対象に
        for a in div('a'):
            # 日付を取得
            rec.append(a.renderContents())
    # class 属性が data_0_0 である td タグを対象に
    for td in tr('td',{'class':'data_0_0'}):
        # 各データを取得
        data = td.renderContents().strip()
        rec.append(data)
    if rec != []: records.append(rec)

# 取得したデータを出力
for rec in records:
    # 指定された日付の期間以外は出力しない
    if int(rec[0]) not in range(DATE_FROM, DATE_TO + 1): continue
    for i,data in enumerate(rec):
        if i in [1,6]: print unicode(data, 'utf-8'), "\t",
    print

root@hostname:/home/shimizu/python# python scraping-bs4-3.py
998.7   6.0
1007.2  6.2
1011.6  5.9
1014.6  7.0
1014.0  6.0
root@hostname:/home

実際にデータを取得する-宅配便の配送状況-

Pythonでスクレイピング

より拝借

root@hostname:/home/shimizu/python# cat scraping-bs4-4.py
# coding: UTF-8
import urllib,urllib2,sys,re
reload(sys)
sys.setdefaultencoding('utf-8')

from bs4 import BeautifulSoup

def kuroneko_check(n):
    r = re.compile(r'>([^<]+)<br')
    url = "http://toi.kuronekoyamato.co.jp/cgi-bin/tneko"
    data = urllib.urlencode({"number00": "1", "number01": n})
    #html = urllib2.urlopen(url,data).read().decode('shift_jis').encode('utf-8')
    html = urllib2.urlopen(url,data).read()
    soup = BeautifulSoup(html)
    #print soup.prettify
    s = str(soup.find_all("td",class_="ct")[0].font)
    return r.findall(s)[0]

print(kuroneko_check('4320-xxxx-xxxx'))

root@hostname:/home/shimizu/python# python scraping-bs4-4.py
配達完了

実際にデータを取得する-文章を抽出する-

第3回 スクレイピングにチャレンジ! (2/3)
http://itpro.nikkeibp.co.jp/article/COLUMN/20080407/298191/?ST=develop&P=2
より拝借

root@hostname:/home/shimizu/python# cat scraping-bs4-5.py
# coding: UTF-8
import urllib2,sys
reload(sys)
sys.setdefaultencoding('utf-8')

from bs4 import BeautifulSoup,NavigableString

url = 'http://www.linux.or.jp/'

html = urllib2.urlopen(url).read()

soup = BeautifulSoup(html)

def printText(tags):
    for tag in tags:
        if tag.__class__ == NavigableString:
            print tag,
        else:
            printText(tag)
    print ""

printText(soup.findAll("p"))

root@hostname:/home/shimizu/python# python scraping-bs4-5.py
すぐに使えるリナックスチュートリアルをご紹介
最新のlinuxニュースやインタビュー記事をお届けします。

30週連続企画:今週はShuah Khan氏の仕事場に潜入!
  続きを読む...
...

実際にデータを取得する-ニュースサイト-

BeautifulSoup実践 -pythonで超カンタンにスクレイピング
http://d.hatena.ne.jp/lolloo-htn/20090128/1233155734
より拝借

root@hostname:/home/shimizu/python# cat scraping-bs4-6.py
# coding: UTF-8
import urllib2,sys
reload(sys)
sys.setdefaultencoding('utf-8')

from bs4 import BeautifulSoup,NavigableString
url = "http://www.yomiuri.co.jp/sports/"

# データ取得
res = urllib2.urlopen(url)
soup = BeautifulSoup(res.read())

def printText(tags):
    for tag in tags:
        if tag.__class__ == NavigableString:
            print tag
        else:
            printText(tag)

for data in soup('span',{'class':'headline'}):
    printText(data)

root@hostname:/home/shimizu/python# python scraping-bs4-6.py
国際陸連が不正の証拠隠蔽か…ドーピング疑惑
(2015年08月16日 21時52分)
興南、2点差から粘り…鳥羽に逆転勝ちで8強
(2015年08月16日 18時34分)
2か月ぶり先発のG・大竹、1失点も白星ならず
(2015年08月16日 18時27分)
...

参考URL

PythonとBeautiful Soupでスクレイピング
http://qiita.com/itkr/items/513318a9b5b92bd56185
Beautiful Soup
http://kondou.com/BS4/
Python で簡単なテキスト処理 (3) – Beautiful Soup を使ってスクレイピング
http://jutememo.blogspot.jp/2008/06/python-3-beautiful-soup.html
Pythonでスクレイピング

スクレイピングまとめ
http://matome.naver.jp/odai/2136441394137974201