hadoop - Hive table reading from GZIP contains meta information like file name in the first row -

- February 15, 2014

i have created external table in hive pointing gzip file

create external table if not exists raw_cn ( column1                        string, column2                       string, column3            string, column4       string, column5            string, column6          string, column7            string, column8           string, column9                        string, column10        string

) partitioned (day_id string, file_type string) row format delimited fields terminated '|' stored textfile;

added partition:

alter table raw_cn add partition (day_id = '20140815' , file_type = 'daily' ) location    '/mapr/mapr.cluster/cn/20140501/daily';

placed gzip file @ above location

however when query table, first row gives me file level information (there no header in file). how resolve issue first row (rest of rows fine):

vendor1_617_cn_daily.201408150000664000202600020260243475554512373676764017202 0ustar  fworksfworks4f06c1a123456|82910|26|espn2|espn2|2014/08/15 01:09:42|2014/08/15     01:10:13|233|53066|jefferson-walworth (jefferson), wi 123456|82910|8|wmlw|wmlw|2014/08/15 03:16:53||233|53066|jefferson-walworth (jefferson), wi 123456|82910|3|witi|witi|2014/08/15 14:34:13|2014/08/15 14:35:20|233|53066|jefferson-walworth (jefferson), wi 123456|82910|43|hgtv|home & garden television (east)|2014/08/15 14:35:20|2014/08/15 14:37:00|233|53066|jefferson-walworth (jefferson), wi

that depends on version of hive using.

for hive version 13 , above:

there table property tblproperties ("skip.header.line.count"="1") can use while creating table. skip no of lines.

for hive version 12 , below:

you need remove line/header manually or using shell/python script.

hope helps...!!!

Search This Blog

Hide

hadoop - Hive table reading from GZIP contains meta information like file name in the first row -

Comments

Post a Comment

Popular posts from this blog

java - Oracle EBS .ClassNotFoundException: oracle.apps.fnd.formsClient.FormsLauncher.class ERROR -

c# - how to use buttonedit in devexpress gridcontrol -

How do you convert a timestamp into a datetime in python with the correct timezone? -