简介:Hadoop是一个开源框架,允许使用简单的编程模型在跨计算机集群的分布式环境中存储和处理大数据。它的设计是从单个服务器扩展到数千个机器,每个都提供本地计算和存储。
一、所需软件:
mysql-connector-java-5.1.17.jar
jdk-6u43-linux-x64-rpm.bin
hive-0.7.1-cdh3u6.tar.gz
hadoop-0.20.2.tar.gz
注:CentOS6虚拟机需要提前安装。
二、安装步骤:
1、防火墙的禁用
service iptalbes stop
chkconfig iptables off
2、静态IP及网络开机启动
vi /etc/sysconfig/network-scripts/ifcfg-eth0
ONBOOT=yes
BOOTPROTO=static
IPADDR=192.168.0.200
NETMASK=255.255.255.0
GATEWAY=192.168.0.1
DNS1=8.8.8.8
3、设置hostname名称
vi /etc/sysconfig/network
HOSTNAME=master
4、重启网路
service network restart
5、安装ssh及rsync
yum install ssh
yum install rsync
service sshd restart
rpm -qa | grep openssh
rpm -qa | grep rsync
6、安装hadoop用户
adduser hadoop
passwd hadoop
7、修改hosts
vi /etc/hosts
192.168.0.200 master
192.168.0.201 slave1
8、配置无密码访问
hadoop用户下:
ssh-keygen -t rsa
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@master
ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@slave1
验证:ssh master ssh slave1
9、安装JDK
下载jdk-6u43-linux-x64-rpm.bin 上传至/webapps中
mkdir /usr/java
cd /usr/java
cp /webapps/jdk-6u43-linux-x64-rpm.bin /usr/java
chmod +x jdk-6u43-linux-x64-rpm.bin
./jdk-6u43-linux-x64-rpm.bin
vi /etc/profile
export JAVA_HOME=/usr/java/jdk1.6.0_43
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
source /etc/profile
java -version
10、安装hadoop
chown -R hadoop /opt
wget http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u6.tar.gz
cp /webapps/hadoop-0.20.2-cdh3u6.tar.gz ./
tar -zxvf hadoop-0.20.2-cdh3u6.tar.gz
进入hadoop用户:
vi hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.6.0_43
export HADOOP_HOME=/opt/hadoop-0.20.2-cdh3u6
修改:core-site.xml
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
修改:hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/hdfs/name</value>
</property>
<property>
<name>dfs.data.dir</name>
<value>/opt/hdfs/data</value>
</property>
</configuration>
修改:mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
</configuration>
修改slaves、masters
vi slaves vi masters
11、配置hadoop环境变量
export HADOOP_HOME=/opt/hadoop-0.20.2-cdh3u6
export PATH=$PATH:$HADOOP_HOME/bin
source /etc/profile
12、格式化HDFS
hadoop namenode -format
13、启动和验证hadoop
chmod +x -R /opt/hadoop-0.20.2-cdh3u6/bin
./start-all.sh
master:
2699 SecondaryNameNode
2845 Jps
2772 JobTracker
2547 NameNode
slave1:
2615 DataNode
2719 TaskTracker
3060 Jps
14、计数demo:
cd /home/hadoop/
touch words vi 'data mining on data warehouse'
hadoop dfs -mkdir /input
hadoop dfs -put /home/hadoop/words /input
hadoop jar /opt/hadoop-0.20.2-cdh3u6/hadoop-examples-0.20.2-cdh3u6.jar wordcount /input /output
hadoop dfs -cat /output/part-r-00000
15、安装Hive
yum install mysql
yum install mysql-server
yum install mysql-devel
service mysql start
mysql -uroot -p
create database hive;
grant all on hive.* to hadoop@'master' identified by 'hivepwd';
flush privileges;
下载hive:
wget http://archive.cloudera.com/cdh/3/hadoop-0.20.2-cdh3u6.tar.gz
wget http://archive.cloudera.com/cdh/3/hive-0.7.1-cdh3u6.tar.gz
解压:
tar -zxvf hive-0.7.1-cdh3u6.tar.gz
touch hive-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hadoop</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hivepwd</value>
</property>
</configuration>
配置/bin/hive-config.sh
export JAVA_HOME=/usr/java/jdk1.6.0_43
export HADOOP_HOME=/opt/hadoop-0.20.2-cdh3u6
将mysql-connector-java-5.1.17.jar放到hive的/lib下
/opt/hive-0.7.1-cdh3u6
vi /etc/profile
export HIVE_HOME=/opt/hive-0.7.1-cdh3u6
export PATH=$PATH:$HIVE_HOME/bin
让其生效:
source /etc/profile
运行hive:
hive
create table test(id int);
select count(1) from test;