mapper-reducer编程搭建

news2025/12/22 3:22:43

一.虚拟机安装CentOS7并配置共享文件夹
二.CentOS 7 上hadoop伪分布式搭建全流程完整教程
三.本机使用python操作hdfs搭建及常见问题
四.mapreduce搭建
五.mapper-reducer编程搭建

mapper-reducer编程搭建

- 一、打开hadoop
- 二、创建mapper.py、reducer.py及参数文件
- - 1.创建 mapper.py
  - 2.创建reducer.py
  - 3.创建参数文件
  - 4.本地测试map与reduce
- 三、测试
- - 1.hadfs中创建目录
  - 2.上传test00.txt到hdfs中
  - 3.执行测试例程
  - 4.下载结果文件

一、打开hadoop

在这里插入图片描述

二、创建mapper.py、reducer.py及参数文件

1.创建 mapper.py

cd /home/huangqifa/software/

touch mapper.py

编辑内容

sudo gedit mapper.py

粘贴如下内容:

#!/usr/bin/env python
import sys
for line in sys.stdin:
	line = line.strip()
	words = line.split()
	for word in words:
		print '%s\t%s' % (word, 1)
# input comes from standard input
# remove leading and trailing whitespace
# split the line into words
# write the results to STDOUT

2.创建reducer.py

touch reducer.py

sudo gedit reducer.py

粘贴如下

#!/usr/bin/env python
from operator import itemgetter
import sys
current_word = None
current_count = 0
word = None
for line in sys.stdin:
	line = line.strip()
	word, count = line.split('\t', 1)
	try:
		count = int(count)
	except ValueError:
		Continue
	if current_word == word:
		current_count += count
	else:
		if current_word:
			print '%s\t%s' % (current_word, current_count)
		current_count = count
		current_word = word
if current_word == word:
	print '%s\t%s' % (current_word, current_count)

赋权

sudo chmod +x mapper.py
sudo chmod +x reducer.py

3.创建参数文件

touch test00.txt

粘贴如下

foo foo quux labs foo bar quux

在这里插入图片描述

4.本地测试map与reduce

测试mapper.py

echo "foo foo quux labs foo bar quux" | ./mapper.py

测试reducer.py

echo "foo foo quux labs foo bar quux" | ./mapper.py | sort -k1,1 | ./reducer.py

#其中sort -k 1起到了将mapper的输出按key排序的作用：-k， -key = POS1[,POS2] .
在这里插入图片描述

三、测试

1.hadfs中创建目录

hdfs dfs -mkdir -p /user/input

2.上传test00.txt到hdfs中

上传test00.txt到hdfs中的 /user/input目录

hdfs dfs -put /home/huangqifa/software/test00.txt /user/input

在这里插入图片描述

3.执行测试例程

hadoop jar /usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.7.jar -files /home/huangqifa/software/mapper.py,/home/huangqifa/software/reducer.py -mapper "mapper.py" -reducer "reducer.py" -input /user/input/test00.txt -output /user/output