Home > Enterprise >  how to fix error when using read.table function
how to fix error when using read.table function

Time:09-26

I am trying to read this data set into r using the read.table function read.table("https://www.fhwa.dot.gov/bridge/nbi/2020/AL20.txt", ) I am getting this error

Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 22 elements

Am I using the correct function to read in this data? If I am how do I fix this? the link to the data set im trying to read in is https://www.fhwa.dot.gov/bridge/nbi/2020/AL20.txt

CodePudding user response:

That data is not delimited by space, tab, comma, or semi-colon. It appears to be fixed-width format. For that, you must know or infer the column widths. I've guessed based on a few rows, but you will need to verify that things have not been unnecessarily split or kept-together.

Additionally, line 10 has an embedded # character, so the default behavior of many R parsing functions to use comment.char="#" must be suppressed, otherwise we get Error: line 10 did not have 24 elements.

Try this,

widths <- diff(c(0,9,18,37,62,80,105,117,127,277,295,298,301,305,309,313,331,338,355,359,373,385,389,433,445))
dat <- read.fwf("https://www.fhwa.dot.gov/bridge/nbi/2020/AL20.txt", widths = widths, comment.char = "")
head(dat, n = 20)
#           V1        V2                  V3                        V4                 V5                        V6           V7         V8                                                                                                                                                     V9                V10 V11 V12  V13  V14  V15          V16  V17               V18  V19            V20          V21 V22                                          V23          V24
# 1  014000000 00000S702 1610400700905300000 PERDIDO CREEK             IRR BIA RTE 4007   11.4KM NW OF ATMORE, AL   999900015000 0000000000 00310610940873413480033626209199902000000502019500970300111150000000000A15505000001000009700147000150000000009700979999N0000N0000008876N2445232777N588    000000021924N   N   N     NA   NA   NA 0.000000e 00 2019 0                 0N2  0100002601010  Y80000592039   0 Y62                                          G70000014550
# 2  014000000 00000S703 1610400300905303004 PERDIDO CREEK             IRR BIA RTE 4003   ON LYNN MCGEE ROAD        999900011000 0000000000 00310620200873408900033626209200202000001592019500650000111150000000000A15501000001000009700129000136003003009100989999N0000N0000008877N2848235176N578    000000021924N   N   N     NA   NA   NA 0.000000e 00 2019 0                 0N2  0100002608010  Y80002002039   0 Y62                                          G70000013328
# 3  014000000 0000M0022 1810000000011319000 UCHEE CREEK               101ST AIRBORNE RD  3.4 KM S OF SUNSHINE RD   999900000000            0032174330-0845837.0183747409194202000003752017401040000001150000000000A15104000003000007300183000430006006007300909999N0000N0000005566N2517237255N586 381000528101924N   Y60 N     NA 1017   NA 1.000001e 12 2019                   0N2  0000001100350  Y50004002039  NA Y74                                          F50000038700
# 4  014000000 883039900 1410008800205900000 LITTLE BEAR CR. DAM SPW   CO. RD 88          LITTLE BEAR CR DAM        999900000000            0034270600087581200016367670819740200000430201750116099011115N000000000A19505000005000008600140000655000000008701059999N0000N0000007777N1442126665N596          061824N   N   N     NA   NA   NA           NA   NA                   0N2  0000001000050  Y50006002037  NA Y67                                          G70000068775
# 5  014000001 014002450 1310010100207900000 TENNESSEE RIVER           SR 101             ON WHEELER DAM             45200394290            00344818000872254000423676706193702000055202017400790000100041017900335A55310302002014306000533019812009002006100780452N0000N0000005659N1400123952N588 311019812061924Y24 N   N    619   NA   NA 5.000000e 16   NA                   0N2  00196210001001 Y50090002037  NA Y67                                          F50001545336
# 6  014000001 331700710 1620000000203300000 TENNESSEE RIVER           Res Rd             ON WILSON DAM             999900000000            00344800000873730000063676717192402020036202018300790000111111017300335A56403120008007806000469015124017005006100839999H1219H6000005558N15841350529593 311015124061824Y24 N   N    618   NA   NA 3.500000e 16   NA                   0N2  00195815000801 Y50025002038  NA Y67                                          F50001255292
# 7  014000002           1410000001012700000 BLACKWATER CREEK          COUNTRY CLUB ROAD  1 MILE EAST HWY 195        45700000100 000CO00000 00335430300871717700023020209189801000004001991100520000000020000000000P15310302001000403700305000518000000003700370457N0000N0000006457N1027101822N072 311000621101924Y24 N   Y03 1019   NA  120 5.110001e 14 2020 0 _               0N3  0000006600010  Y50005602032   0 N01                                          P40000019166
# 8  014000004           1410000001012700000 LOST CREEK                KINGS MILL ROAD    1 MILE N OAKMAN HWY        40200000100 000CO00000 00334409500872134600133020209190001000001501991200490000000050000000000K15310302001000503700256000482000000003700370402N0000N0000004446N5000500000N082 311000583011324Y24 N   Y03  213   NA  113 4.800000e 14 2020 0 _               0N3  0000006800010  YU0002102033   0 N01                                          P40000017834
# 9  014000004 504800350 1680000000109500000 LITTLE PAINT ROCK CREEK   LOCK ACCESS ROAD   0.3 MI SOUTH US 431       999900056330            00342818000862230001993676709193902000003502018400610000100050000000000A15104000001000007300183000192003003007000829999N0000N0000007776N1477128874N568          121824N   N   N     NA   NA   NA           NA   NA                   0N2  0000001600050  Y50003752038  NA Y67                                          G70000015744
# 10 014000005           1500000001007548808 STREAM                    RAILROAD STREET    0.5 MI E SR#17            999900004990 000MU00040 00333357210880425781583040409190001000000152007100300000000050000000000P15410000002000004400079000155000000004400489999N0000N9990005445N1027101826N045 311000223061924Y24 N   Y12  619   NA  619 2.300000e 14 2020 0 _               0N3  0000008700000  Y40000592032   0 N01                                          P40000007440
# 11 014000007           1410001000104900000 COAL   CREEK              CO  RT  10         1.25  MI N ARONEY  T9SR5E 999900000100 000CO00100 00341403150860413920053020208199301000001802017500580000111150000000000A15122000003000007500073000219000000007500759999N0000N0000007675N1637138265N576    000300031924N   N   N     NA   NA   NA 4.930000e 14 2020 0 _               0N3  0000002000050  Y50002682037   0 N01                                          F60000016425
# 12 014000008           1410002700500700000 VIADUCT OVER SO.RR        COUNTY 27          OLD AL 5 AT WOODSTK       999900000100 000CO00270 0033122146087091361010302020919270200000513198920085000000002N000000000P12303104001000306100332000652000000006100709999R0914R183061777NN213620952390N7 311000756101924Y12 N   Y12 1019   NA 1019 1.192000e 15 2020 0 _               0N2  0000001000200  YN0007652030   0 N01                                          G70000045640
# 13 01400000B 253040000 1410003700205900000 BEAR CREEK DAM SPILLWAY   CO. RD 37          BEAR CREEK DAM            999900000000            0034235400087590600021367670819690200000430201750116000010115N000000000A19505000008000008500143001170000000008500969999N0000N0000007677N1489129265N596          061824N   N   N     NA   NA   NA           NA   NA                   0N2  0000002600050  Y50006002037  NA Y67                                          F60000112320
# 14 01400000B LN0BR0010 1880000000107100000 TOWN CREEK                BELLEFONTE ROAD    Bellefonte                999900000000            003443470008554570001136767091974020000025020184012800000N0N50000000000A15119000002000006400030000064000000000000009999N0000N000000NNN76337332886NN577          031924N   Y60 N     NA  716   NA 0.000000e 00   NA                   0N2  000000N6NN010  Y50005002038  NA Y67                                          F60000008192
# 15 01400000T 721700150 1680000000203300000 CANE CREEK                Access Road        Colbert Steam Plant       999900000000            003444120008750540019936767091953020000010020184006703001NN050000000000A15104000005000007300162000808006006007300919999N0000N0000006666N1464127966N598          061824N   Y60 N     NA  819   NA           NA   NA                   0N2  0000001000100  Y50001002038  NA Y67                                          F60000073528
# 16 014000012           1410009100107109328 SOUTHERNLNRAILROAD        JC91               1001E US 72               999900000100 000CO00910 0034563691086433691002304040919000200000495200710073000000002N000000000A12101101003000206100101000363000000006100699999R0678R320000566NN043503276345N6 351000455011924N   N   Y12   NA   NA  120 6.860001e 14 2020 0 _               0N2  0000001000150  YN0005002039   0 N01                                          F50000025047
# 17 014000013 771700120 1600000000203300000 BUS BAR TUNNEL- REMOVED   TVA RES. ROAD      NEAR NAT. FERT. DEV. CTR. 999900000000            0034465400087390000005367671919580200003620201860073000011115N000000000A10302000001000007900177000189006006007900989999N0000N000000677NN1551133073N5N7 331000189061824N   N   N     NA   NA   NA 1.800000e 14   NA                   0N2  0000001600070  YN0045002038  NA Y67                                          F60000018522
# 18 014000019           1410043800108900000 RODGERS BRANCH            JACKS RD           1 MI SE NEW MARKET        999900000100 000CO04380 00345334170862519520053020209190002000002501992200790450000050000000000A15319000002000007600073000146000000000000769999N0000N000000NNN55135712885NN555 311000214081824N   N   N     NA   NA   NA 4.070000e 14 2020 0 _               0N2  000000NNNN010  Y80002902035   0 N01                                          F50000011096
# 19 014000022           1410004500104900000 LAUREL  CREEK             CO RD 313          1.3MI S GILBERT XRDT8SR7E 999900000100 000CO00450 00342101790855522060053020209190201000000872017000400000000050000000000K15302000001000099900085000085000000003400379999N0000N0000006655N1000100003N048 311000152111724N   N   Y12   NA   NA 1117 1.250000e 14 2020 0 _               0N3  0000008700000  Y50001292037   0 N01                                          F50000003145
# 20 014000024           1510000000311707000 LITTLE CAHABA RIVER       CAHABA BEACH ROAD  1.8 MILES NORTH OF US 280  47500005000 000MU01630 00332623190864157360063040419190401000001001996000490000000030000000000K15310000001000003500335000335000000003500359999N0000N0000005254N2000200003N066 311000423101924Y24 N   Y03  990   NA  996 3.190000e 14 2020 0 _               0N3  0000008700010  Y60001382019   0 N01                                          P20000011725

There could easily be more to do to break this up into the requisite columns. For instance, V9 looks suspiciously long and has some pattern to it, so it could easily be several columns concatenated.

FYI, I created the numbers for my widths above by counting characters. I think there are tools out there that provide good hints at how to start with this, but with FWF files (where some columns have no space separating it from the next) it's never deterministic. The only way to know for sure is to know the data.

CodePudding user response:

We can use add fill = TRUE and there is also some duplicate row names which the data.frame (output structure) wouldn't allow. So, specify the rownames as NULL

df1 <- read.table("https://www.fhwa.dot.gov/bridge/nbi/2020/AL20.txt",
      fill = TRUE, row.names = NULL)

-output

> dim(df1)
[1] 19044    22
  • Related