hadoop - Hive table reading from GZIP contains meta information like file name in the first row -
i have created external table in hive pointing gzip file
create external table if not exists raw_cn ( column1 string, column2 string, column3 string, column4 string, column5 string, column6 string, column7 string, column8 string, column9 string, column10 string
) partitioned (day_id string, file_type string) row format delimited fields terminated '|' stored textfile;
added partition:
alter table raw_cn add partition (day_id = '20140815' , file_type = 'daily' ) location '/mapr/mapr.cluster/cn/20140501/daily';
placed gzip file @ above location
however when query table, first row gives me file level information (there no header in file). how resolve issue first row (rest of rows fine):
vendor1_617_cn_daily.201408150000664000202600020260243475554512373676764017202 0ustar fworksfworks4f06c1a123456|82910|26|espn2|espn2|2014/08/15 01:09:42|2014/08/15 01:10:13|233|53066|jefferson-walworth (jefferson), wi 123456|82910|8|wmlw|wmlw|2014/08/15 03:16:53||233|53066|jefferson-walworth (jefferson), wi 123456|82910|3|witi|witi|2014/08/15 14:34:13|2014/08/15 14:35:20|233|53066|jefferson-walworth (jefferson), wi 123456|82910|43|hgtv|home & garden television (east)|2014/08/15 14:35:20|2014/08/15 14:37:00|233|53066|jefferson-walworth (jefferson), wi
that depends on version of hive using.
for hive version 13 , above:
there table property tblproperties ("skip.header.line.count"="1")
can use while creating table. skip no of lines.
for hive version 12 , below:
you need remove line/header manually or using shell/python script.
hope helps...!!!
Comments
Post a Comment