I am trying to read this data set into r using the read.table function
read.table("https://www.fhwa.dot.gov/bridge/nbi/2020/AL20.txt", )
I am getting this error
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 1 did not have 22 elements
Am I using the correct function to read in this data? If I am how do I fix this? the link to the data set im trying to read in is https://www.fhwa.dot.gov/bridge/nbi/2020/AL20.txt
CodePudding user response:
That data is not delimited by space, tab, comma, or semi-colon. It appears to be fixed-width format. For that, you must know or infer the column widths. I've guessed based on a few rows, but you will need to verify that things have not been unnecessarily split or kept-together.
Additionally, line 10 has an embedded #
character, so the default behavior of many R parsing functions to use comment.char="#"
must be suppressed, otherwise we get Error: line 10 did not have 24 elements
.
Try this,
widths <- diff(c(0,9,18,37,62,80,105,117,127,277,295,298,301,305,309,313,331,338,355,359,373,385,389,433,445))
dat <- read.fwf("https://www.fhwa.dot.gov/bridge/nbi/2020/AL20.txt", widths = widths, comment.char = "")
head(dat, n = 20)
# V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24
# 1 014000000 00000S702 1610400700905300000 PERDIDO CREEK IRR BIA RTE 4007 11.4KM NW OF ATMORE, AL 999900015000 0000000000 00310610940873413480033626209199902000000502019500970300111150000000000A15505000001000009700147000150000000009700979999N0000N0000008876N2445232777N588 000000021924N N N NA NA NA 0.000000e 00 2019 0 0N2 0100002601010 Y80000592039 0 Y62 G70000014550
# 2 014000000 00000S703 1610400300905303004 PERDIDO CREEK IRR BIA RTE 4003 ON LYNN MCGEE ROAD 999900011000 0000000000 00310620200873408900033626209200202000001592019500650000111150000000000A15501000001000009700129000136003003009100989999N0000N0000008877N2848235176N578 000000021924N N N NA NA NA 0.000000e 00 2019 0 0N2 0100002608010 Y80002002039 0 Y62 G70000013328
# 3 014000000 0000M0022 1810000000011319000 UCHEE CREEK 101ST AIRBORNE RD 3.4 KM S OF SUNSHINE RD 999900000000 0032174330-0845837.0183747409194202000003752017401040000001150000000000A15104000003000007300183000430006006007300909999N0000N0000005566N2517237255N586 381000528101924N Y60 N NA 1017 NA 1.000001e 12 2019 0N2 0000001100350 Y50004002039 NA Y74 F50000038700
# 4 014000000 883039900 1410008800205900000 LITTLE BEAR CR. DAM SPW CO. RD 88 LITTLE BEAR CR DAM 999900000000 0034270600087581200016367670819740200000430201750116099011115N000000000A19505000005000008600140000655000000008701059999N0000N0000007777N1442126665N596 061824N N N NA NA NA NA NA 0N2 0000001000050 Y50006002037 NA Y67 G70000068775
# 5 014000001 014002450 1310010100207900000 TENNESSEE RIVER SR 101 ON WHEELER DAM 45200394290 00344818000872254000423676706193702000055202017400790000100041017900335A55310302002014306000533019812009002006100780452N0000N0000005659N1400123952N588 311019812061924Y24 N N 619 NA NA 5.000000e 16 NA 0N2 00196210001001 Y50090002037 NA Y67 F50001545336
# 6 014000001 331700710 1620000000203300000 TENNESSEE RIVER Res Rd ON WILSON DAM 999900000000 00344800000873730000063676717192402020036202018300790000111111017300335A56403120008007806000469015124017005006100839999H1219H6000005558N15841350529593 311015124061824Y24 N N 618 NA NA 3.500000e 16 NA 0N2 00195815000801 Y50025002038 NA Y67 F50001255292
# 7 014000002 1410000001012700000 BLACKWATER CREEK COUNTRY CLUB ROAD 1 MILE EAST HWY 195 45700000100 000CO00000 00335430300871717700023020209189801000004001991100520000000020000000000P15310302001000403700305000518000000003700370457N0000N0000006457N1027101822N072 311000621101924Y24 N Y03 1019 NA 120 5.110001e 14 2020 0 _ 0N3 0000006600010 Y50005602032 0 N01 P40000019166
# 8 014000004 1410000001012700000 LOST CREEK KINGS MILL ROAD 1 MILE N OAKMAN HWY 40200000100 000CO00000 00334409500872134600133020209190001000001501991200490000000050000000000K15310302001000503700256000482000000003700370402N0000N0000004446N5000500000N082 311000583011324Y24 N Y03 213 NA 113 4.800000e 14 2020 0 _ 0N3 0000006800010 YU0002102033 0 N01 P40000017834
# 9 014000004 504800350 1680000000109500000 LITTLE PAINT ROCK CREEK LOCK ACCESS ROAD 0.3 MI SOUTH US 431 999900056330 00342818000862230001993676709193902000003502018400610000100050000000000A15104000001000007300183000192003003007000829999N0000N0000007776N1477128874N568 121824N N N NA NA NA NA NA 0N2 0000001600050 Y50003752038 NA Y67 G70000015744
# 10 014000005 1500000001007548808 STREAM RAILROAD STREET 0.5 MI E SR#17 999900004990 000MU00040 00333357210880425781583040409190001000000152007100300000000050000000000P15410000002000004400079000155000000004400489999N0000N9990005445N1027101826N045 311000223061924Y24 N Y12 619 NA 619 2.300000e 14 2020 0 _ 0N3 0000008700000 Y40000592032 0 N01 P40000007440
# 11 014000007 1410001000104900000 COAL CREEK CO RT 10 1.25 MI N ARONEY T9SR5E 999900000100 000CO00100 00341403150860413920053020208199301000001802017500580000111150000000000A15122000003000007500073000219000000007500759999N0000N0000007675N1637138265N576 000300031924N N N NA NA NA 4.930000e 14 2020 0 _ 0N3 0000002000050 Y50002682037 0 N01 F60000016425
# 12 014000008 1410002700500700000 VIADUCT OVER SO.RR COUNTY 27 OLD AL 5 AT WOODSTK 999900000100 000CO00270 0033122146087091361010302020919270200000513198920085000000002N000000000P12303104001000306100332000652000000006100709999R0914R183061777NN213620952390N7 311000756101924Y12 N Y12 1019 NA 1019 1.192000e 15 2020 0 _ 0N2 0000001000200 YN0007652030 0 N01 G70000045640
# 13 01400000B 253040000 1410003700205900000 BEAR CREEK DAM SPILLWAY CO. RD 37 BEAR CREEK DAM 999900000000 0034235400087590600021367670819690200000430201750116000010115N000000000A19505000008000008500143001170000000008500969999N0000N0000007677N1489129265N596 061824N N N NA NA NA NA NA 0N2 0000002600050 Y50006002037 NA Y67 F60000112320
# 14 01400000B LN0BR0010 1880000000107100000 TOWN CREEK BELLEFONTE ROAD Bellefonte 999900000000 003443470008554570001136767091974020000025020184012800000N0N50000000000A15119000002000006400030000064000000000000009999N0000N000000NNN76337332886NN577 031924N Y60 N NA 716 NA 0.000000e 00 NA 0N2 000000N6NN010 Y50005002038 NA Y67 F60000008192
# 15 01400000T 721700150 1680000000203300000 CANE CREEK Access Road Colbert Steam Plant 999900000000 003444120008750540019936767091953020000010020184006703001NN050000000000A15104000005000007300162000808006006007300919999N0000N0000006666N1464127966N598 061824N Y60 N NA 819 NA NA NA 0N2 0000001000100 Y50001002038 NA Y67 F60000073528
# 16 014000012 1410009100107109328 SOUTHERNLNRAILROAD JC91 1001E US 72 999900000100 000CO00910 0034563691086433691002304040919000200000495200710073000000002N000000000A12101101003000206100101000363000000006100699999R0678R320000566NN043503276345N6 351000455011924N N Y12 NA NA 120 6.860001e 14 2020 0 _ 0N2 0000001000150 YN0005002039 0 N01 F50000025047
# 17 014000013 771700120 1600000000203300000 BUS BAR TUNNEL- REMOVED TVA RES. ROAD NEAR NAT. FERT. DEV. CTR. 999900000000 0034465400087390000005367671919580200003620201860073000011115N000000000A10302000001000007900177000189006006007900989999N0000N000000677NN1551133073N5N7 331000189061824N N N NA NA NA 1.800000e 14 NA 0N2 0000001600070 YN0045002038 NA Y67 F60000018522
# 18 014000019 1410043800108900000 RODGERS BRANCH JACKS RD 1 MI SE NEW MARKET 999900000100 000CO04380 00345334170862519520053020209190002000002501992200790450000050000000000A15319000002000007600073000146000000000000769999N0000N000000NNN55135712885NN555 311000214081824N N N NA NA NA 4.070000e 14 2020 0 _ 0N2 000000NNNN010 Y80002902035 0 N01 F50000011096
# 19 014000022 1410004500104900000 LAUREL CREEK CO RD 313 1.3MI S GILBERT XRDT8SR7E 999900000100 000CO00450 00342101790855522060053020209190201000000872017000400000000050000000000K15302000001000099900085000085000000003400379999N0000N0000006655N1000100003N048 311000152111724N N Y12 NA NA 1117 1.250000e 14 2020 0 _ 0N3 0000008700000 Y50001292037 0 N01 F50000003145
# 20 014000024 1510000000311707000 LITTLE CAHABA RIVER CAHABA BEACH ROAD 1.8 MILES NORTH OF US 280 47500005000 000MU01630 00332623190864157360063040419190401000001001996000490000000030000000000K15310000001000003500335000335000000003500359999N0000N0000005254N2000200003N066 311000423101924Y24 N Y03 990 NA 996 3.190000e 14 2020 0 _ 0N3 0000008700010 Y60001382019 0 N01 P20000011725
There could easily be more to do to break this up into the requisite columns. For instance, V9
looks suspiciously long and has some pattern to it, so it could easily be several columns concatenated.
FYI, I created the numbers for my widths
above by counting characters. I think there are tools out there that provide good hints at how to start with this, but with FWF files (where some columns have no space separating it from the next) it's never deterministic. The only way to know for sure is to know the data.
CodePudding user response:
We can use add fill = TRUE
and there is also some duplicate row names which the data.frame
(output structure) wouldn't allow. So, specify the rownames as NULL
df1 <- read.table("https://www.fhwa.dot.gov/bridge/nbi/2020/AL20.txt",
fill = TRUE, row.names = NULL)
-output
> dim(df1)
[1] 19044 22