#!/usr/bin/python. from subprocess import Popen, PIPE. print "Before Loop" cat = Popen(["hadoop", "fs", "-cat", "./sample.txt"], stdout=PIPE) print "After Loop 1" put = Popen(["hadoop", "fs", "-put", "-", "./modifiedfile.txt"], stdin=cat. stdout) put. communicate() Read this also: The Hadoop Distributed File System (HDFS) is a Java-based distributed, scalable, and portable filesystem designed to span large clusters of commodity servers. The design of HDFS is based on GFS, the Google File System, which is described in a paper published by Google. Like many other distributed filesystems, HDFS holds a large amount of data and provides transparent access to many clients distributed across a network. Where HDFS excels is in its ability to store very large files in a reliable and scalable manner. HDFS is designed to store a lot of information, typically petabytes (for very l...
Nyc
ReplyDeleteWTD
ReplyDelete