需求:
1.基于hadoop jar 执行hadoop的job
2.参数也要可输入
3.shell脚本可供crontab调度
方式:
1.java解析输入的参数,并对参数进行规范定义
2.shell实现hadoop jar命令行执行,调度脚本用shell实现
3.crontab调度调度脚本
实现:
java解析输入参数:
/** ** 获取命令行参数,命令行job参数格式如下: * --param1 val1 \ * --param2 val2 \ ** * @param args 命令行参数 * @return 返回map参数映射对 * @date 2013-11-13 */public MapparseMRCommands(String[] args) { Map commands = new HashMap (); String key = null; for (String cmdStr : args) { if (cmdStr.startsWith("--")) { if (key != null) { commands.put(key, ""); } key = cmdStr.substring(2); } else { // add new command key:value commands.put(key, cmdStr); // clear key key = null; } } return commands;}
输入的参数规范如下:
--param1 val1 \
shell执行脚本run.sh:
#! /bin/bashhadoop jar ../lib/test-SNAPSHOT.jar com.test.TTask \ --input.path.key /user/input/texts \ --output.path.key /user/output/texts.out \
shell调度脚本cron-run.sh:
#!/bin/sh#File:cron-run.shsource /user/.bash_profilecd $DEV_WORKING/mapred/binprocess_id=`jps -m | grep "TTask" | awk '{print $1}'`process_id=${process_id:=0}date if [ $process_id -gt 0 ] then echo "job is running, pid = $process_id" else echo "pid is null, job runing now, start..." nohup ./run.sh > run.log 2>&1 &fi
然后就可以直接在crontab中对cron-run.sh做周期性调度