Ubuntu10.04LTS配置Hadoop1.0.1+HBase 0.92.0-白红宇

Ubuntu10.04LTS配置Hadoop1.0.1+HBase 0.92.0

阅读量：6887 次

发布时间：2019-06-27

本文共 6104 字，大约阅读时间需要 20 分钟。

（关于Hadoop的单机和伪分布式配置参见：）当写这篇文章的时候，最新版的hadoop是1.0.1版本，支持了许多新的特性，使得hbase持久化不易丢失数据。因此改用新的版本作说明：

1.配置前重要的系统设置：

1）像Hbase这样的分布式的数据库，在启动的时候会在系统里打开很多的文件，而普通的系统对打开文件数目的限制比较小，因此如果不改变系统设置，就会引起JAVA虚拟机的IOException。

在/etc/security/limits.conf增加一行：

hadoop  -       nofile  32768

将hadoop替换为你要运行hadoop的用户名字，如果你有多个用户需要运行hadoop，那么就写多行。

2）设置系统允许运行的最大进程数目：

在相同的文件中加入：

hadoop soft/hard nproc 32000

同样将hadoop替换为你要使用的用户名称。

3）最后在/etc/pam.d/common-session的最后一行加入：

session required  pam_limits.so

否则以上的配置不会生效。

2.配置hadoop（先只在master上配置）

1）配置conf/hadoop-env.sh

至少应该在此文件中指定JAVA_HOME=你系统中jdk的安装路径

2）配置Hadoop守护进程：（假设Hadoop安装目录是opt/hadoop）


        
            
     
      fs.default.name
     	
             
     
      hdfs://master:54310/


        
    	
     
      dfs.name.dir
     	
     
      /opt/hadoop/name/
      	
     
      	   Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.        
         
        
    	
     
      dfs.data.dir
     	
     
      /opt/hadoop/data/
      	
     
      	   Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.        
         
        
            
     
      dfs.replication
             
     
      2


        
    	 
     
      mapred.job.tracker
     	 
     
      hdfs://master:54311/
     
     	 
     
      	      Host or IP and port of JobTracker.	 
         
        
    	   
     
      mapred.tasktracker.map.tasks.maximum
     	   
     
      4
     	   
     
      		The maximum number of MapReduce tasks, which are run simultaneously on a given TaskTracker, individually.	   
         
        
    	   
     
      mapred.tasktracker.reduce.tasks.maximum
     	   
     
      2
     	   
     
      		The maximum number of MapReduce tasks, which are run simultaneously on a given TaskTracker, individually.

3）配置conf/slaves

#在此文件中列出所有的slave主机，用主机名或者IP地址都可以,例如有slave1，slave2两个主机slave1  #这样定义，需要在所在主机的/etc/hosts 文件中增加一行   此机器ip地址  slave1    下同slave2

4）配置 conf/masters

#此文件列出所有的master主机，例如只有master

4.配置HBase

1）使整个HDFS系统支持durable sync功能，HBase不再丢失数据，这个功能只有Hadoop versions 0.20.205.x 和以后的1.0.x支持这个配置，需要在clientside：hbase-site.xml 和 serverside的hdfs-site.xml中加入


        
     
      dfs.support.append
         
     
      true

2)配置HDFS一次最多可以提供的文件数上限：hadoop/conf/hdfs-site.xml


            
     
      dfs.datanode.max.xcievers
             
     
      4096

否则可能得到错误提示：10/12/08 20:10:31 INFO hdfs.DFSClient: Could not obtain block blk_XXXXXXXXXXXXXXXXXXXXXX_YYYYYYYY from any node: java.io.IOException: No live nodes contain current block. Will get new block locations from namenode and retry...

2.配置hbase

1)在全分布式模式下，不只有一个主机。在hbase-site.xml中，设置hbase.cluster.distributed为true，并通过设置hbase.rootdir指名HDFS NameNode和它在HDFS的位置，这里也是HBase写入数据的位置。


        
             
      
       hbase.rootdir
              
      
       hdfs://master:54310/hbase
              
      
       The directory shared by RegionServers.
          
         
             
      
       hbase.cluster.distributed
              
      
       true
              
      
       The mode the cluster will be in. Possible values are            false: standalone and pseudo-distributed setups with managed Zookeeper            true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)        
          
         
             
      
       dfs.replication
              
      
       2
              
      
       The replication count for HLog and HFile storage. Should not be greater than HDFS datanode count.

2)HDFS client配置

值得注意的是，如果你已经在hadoop集群上进行了HDFS客户端配置，即配置你想要使用的HDFS客户端，而不是服务器端配置：HBase的不会看到这种配置，除非你做以下事情之一：

（1）让hbase-env.sh中，的环境变量HBASE_CLASSPATH=你的HADOOP_CONF_DIR

（2）添加 hdfs-site.xml (或hadoop-site.xml) 的副本，或更好的方法：在${HBASE_HOME}/conf下添加它们的symlinks

（3）如果添加的HDFS client规模不大，可以将他们配置到hbase-site.xml中。

在这里采用第二种方式，在hbase的conf目录中，用ln -s建立hadoop/conf/hdfs-site.xml的软连接。

3）配置zookeeper

在conf/hbase-env.sh 中的HBASE_MANAGES_ZK变量默认是true, 告诉HBase 是否将ZooKeeper 配合servers 作为HBase启动的一部分。一般说来最好让集群中机器的个数控制在奇数个，原因如下：比如一个4个节点的集群中，需要启动3个zookeeper节点，只支持一个节点宕机的容错，5个节点也需要启动3个zookeeper节点，但支持两个节点宕机的容错。

例如：设置管理3个节点(master,slave1,slave2)的zookeeper,绑定端口2222，确定在conf/hbase-env.sh中的HBASE_MANAGE_ZK被注释掉（默认是true）或者设置为true，接下来配置conf/hbase-site.xml,设置hbase.zookeeper.property.clientPort 和 hbase.zookeeper.quorum。你必须设置hbase.zookeeper.property.dataDir到另一个目录，因为默认的/tmp在系统重启的时候会被清除。在下面的例子中，我们让zookeeper将数据保存到/opt/hbase/zookeeper中。


            
     
      hbase.zookeeper.property.clientPort
             
     
      2222
             
     
      Property from ZooKeeper's config zoo.cfg.        The port at which the clients will connect.        
         
        
          
     
      hbase.zookeeper.quorum
           
     
      master,slave1,slave2
           
     
      Comma separated list of servers in the ZooKeeper Quorum.      For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com".      By default this is set to localhost for local and pseudo-distributed modes      of operation. For a fully-distributed setup, this should be set to a full      list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh      this is the list of servers which we will start/stop ZooKeeper on.      
         
        
          
     
      hbase.zookeeper.property.dataDir
           
     
      /opt/hbase/zookeeper
           
     
      Property from ZooKeeper's config zoo.cfg.      The directory where the snapshot is stored.

4）配置conf/regionservers

把俩个slave主机当作regionservers，可以在文件中加入：

slave1slave2

5）在conf/hbase-env.sh中指定java的安装路径

到此为止，主要的配置结束了，现在你需要保证master主机可以无密码访问两个slave主机：

ssh-keygen -t rsa#然后一路回车ssh-copy-id -i ~/.ssh/id_rsa.pub user@slave1ssh-copy-id -i ~/.ssh/id_rsa.pub user@slave2#分别将ssh的公钥加入到两个slave主机

然后用scp 命令将配置好的两个文件夹传输到slave主机上的相同路径下。

6）格式化namenode

hadoop/bin/hadoop namenode -format

7）启动hadoop

hadoop/bin/start-all.sh

8）启动hbase

hbase/bin/start-hbase.sh

9）如果HMaster运行不正常，或出现有关于HDFS的异常，你可能需要关闭master的防火墙。

sudo ufw disable

-----------------------------配置完成----------------------------------

转载于:https://my.oschina.net/unclegeek/blog/42612

你可能感兴趣的文章

spring中加载xml配置文件的方式 .

stm32 使用 printf 串口输出配置

查看>>

java 同步锁 synchronized 死锁 lock锁 jion 线程结束

查看>>

jsf开发心得(3)-jsf应用中css运用背景图片显示不了的问题

查看>>

IOS UIAlertController 弹出框中添加视图（例如日期选择器等等）

JDK容器学习之List: CopyOnWriteArrayList,ArrayList,LinkedList对比

查看>>

acl_cpp 编程之 xml 流式解析与创建

查看>>

Maven 是怎样创建War 包？

查看>>

Android4.2蓝牙Enable完全分析

查看>>