Python USACO Bronze Level Algorithm Problem: Logic is Sound in Theory, but do not Know how to Implem-CodePudding

I am practicing for the upcoming USACO competition in December, and am attempting to solve this problem: http://www.usaco.org/index.php?page=viewproblem2&cpid=1131

Bessie the cow has enrolled in a computer science PhD program, driven by her love of computer science and also the allure of one day becoming "Dr. Bessie". Having worked for some time on her academic research, she has now published N papers (1≤N≤100000), and her i-th paper has accumulated ci citations (0≤ci≤100000) from other papers in the research literature. Bessie has heard that an academic's success can be measured by their h-index. The h-index is the largest number h such that the researcher has at least h papers each with at least h citations. For example, a researcher with 4 papers and respective citation counts (1,100,2,3) has an h-index of 2, whereas if the citation counts were (1,100,3,3) then the h-index would be 3.

To up her h-index, Bessie is planning to write a survey article citing several of her past papers. Due to page limits, she can include at most L citations in this survey (0≤L≤100000), and of course she can cite each of her papers at most once.

Help Bessie determine the maximum h-index she may achieve after writing this survey.

Note that Bessie's research advisor should probably inform her at some point that writing a survey solely to increase one's h index is ethically dubious; other academics are not recommended to follow Bessie's example here.

INPUT FORMAT (input arrives from the terminal / stdin):

The first line of input contains N and L. The second line contains N space-separated integers c1,…,cN.

OUTPUT FORMAT (print output to the terminal / stdout):

The maximum h-index Bessie may achieve after writing the survey.

SAMPLE INPUT:

4 0
1 100 2 3

SAMPLE OUTPUT:

Bessie cannot cite any of her past papers. As mentioned above, the h-index for (1,100,2,3) is 2.

SAMPLE INPUT:

4 1
1 100 2 3

SAMPLE OUTPUT:

If Bessie cites her third paper, then the citation counts become (1,100,3,3). As mentioned above, the h-index for these counts is 3.

SCORING:

Test cases 1-7 satisfy N≤100.
Test cases 8-10 satisfy N≤1000.
Test cases 11-17 satisfy N≤100000.

My Python 3 code for this problem:

# read data and sort list
line = input().strip().split()
n, l = int(line[0]), int(line[1])
cite = [int(i) for i in input().strip().split()]
cite.sort()

# initiate the integer that will keep track of the largest number that satisfies the conditions and the set we will use to check whether or not a number has already been checked
max_num = 0
nums = set()

# iterate over every number
for i in range(n):
    if cite[i] in nums:
        # skip the iteration if the number has already been checked
        continue
    # add the number to the set
    nums.add(cite[i])
    # if any numbers are 1 less than the current number, keep track of them to add later on in the program. set extra to l if there are more numbers that are less than the current number that citations to be used.
    if cite.count(cite[i]-1) <= l:
        extra = cite.count(cite[i]-1)
    else:
        extra = l
    # check if the h-index is greater than the currently largest h-index and replace it if it is.
    if len(cite[i:])   extra >= cite[i] and cite[i] >= max_num:
        max_num = cite[i]

# for a list of citations that only have 1 integer in it, check if l is > 0. if it is, the max number is that number   1.
if n == 1 and l > 0:
    max_num = cite[0]   1

print(max_num)

I believe the logic is correct, but when the input is

1000 251
500 500 500 500 500 502 500 502 500 502 500 500 500 500 500 500 500 500 502 502 500 500 500 500 500 500 500 500 500 500 500 500 502 500 500 500 500 502 500 500 500 500 500 500 500 500 500 500 500 500 500 502 500 502 500 500 500 500 500 502 500 500 500 500 500 502 502 500 502 500 500 500 500 502 502 502 500 500 500 500 500 502 502 502 500 500 500 500 502 500 500 500 500 500 500 500 500 502 500 500 500 500 500 500 500 500 500 500 502 500 500 502 500 500 500 500 500 500 500 500 502 502 500 500 502 500 500 500 502 500 500 502 502 500 500 500 502 500 500 502 500 500 500 502 502 500 500 500 500 500 500 502 502 500 500 500 500 500 500 500 500 500 500 500 500 500 500 502 500 500 500 500 500 502 502 500 500 500 500 500 502 502 502 500 500 502 500 500 500 500 500 500 500 500 502 500 500 500 500 500 502 502 500 502 500 502 500 500 500 500 500 502 502 500 500 502 500 502 502 500 500 502 500 500 502 500 500 502 500 500 502 500 500 502 500 500 500 500 500 500 500 500 502 500 502 502 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 502 500 500 500 500 502 502 502 500 500 500 502 500 500 500 500 502 500 500 500 502 502 502 502 500 500 500 500 500 500 500 500 500 500 502 500 502 500 502 502 500 502 500 500 500 500 500 502 500 500 500 500 500 502 500 500 500 500 502 500 500 500 500 500 500 500 500 500 500 500 502 500 500 500 500 502 500 500 500 500 502 502 500 502 502 500 500 500 502 502 500 500 500 502 502 500 500 500 500 500 500 500 500 500 500 500 500 502 502 500 502 502 500 500 500 500 500 502 500 502 500 500 500 500 500 502 500 500 500 500 500 502 502 502 500 500 500 500 500 500 502 502 500 500 500 502 500 500 502 500 500 502 500 500 500 502 502 500 502 500 500 500 500 500 500 500 500 500 500 500 500 500 502 502 500 502 500 502 502 500 500 500 500 500 500 500 500 502 500 502 500 500 502 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 502 500 502 502 500 500 500 500 502 500 500 502 502 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 502 502 500 502 500 502 502 500 500 502 500 502 502 500 502 500 500 500 500 500 502 500 502 500 500 500 500 500 500 500 500 500 502 500 500 500 500 500 502 502 502 502 502 500 500 502 500 502 500 500 502 500 500 500 500 500 500 502 502 500 500 502 500 500 500 500 500 500 500 500 500 500 500 502 500 500 502 500 500 502 500 500 500 500 500 502 500 500 500 500 500 500 502 500 502 500 500 500 500 500 500 502 502 500 502 502 500 502 500 500 500 500 500 500 500 502 500 502 500 502 502 500 500 500 500 500 500 502 502 502 500 500 500 500 500 500 500 502 502 500 500 500 500 500 500 500 500 500 500 500 500 500 502 500 502 500 500 502 502 502 500 500 500 500 500 502 500 502 500 500 500 500 500 500 500 500 500 500 500 500 500 502 500 500 500 502 500 502 500 502 502 500 500 500 500 500 500 502 502 500 500 502 500 500 500 500 500 500 500 500 500 502 500 500 500 502 502 500 502 500 502 502 500 500 500 500 502 502 502 502 500 500 500 502 500 500 500 500 500 500 500 500 502 500 502 500 500 502 500 502 500 500 500 500 500 502 500 500 500 500 502 500 500 502 500 502 502 500 502 500 500 500 500 500 500 500 502 500 500 502 500 500 502 500 500 500 500 500 502 502 500 500 500 500 500 500 500 502 500 500 500 500 500 502 502 500 500 502 502 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 500 502 500 502 502 500 500 500 500 502 500 500 500 502 500 500 500 500 500 500 500 500 500 500 500 500 502 500 500 500 500 500 500 500 502 500 500 500 502 500 500 500 500 500 500 500 500 500 500 502 500 500 500 500 500 500 500 502 502 502 502 500 500 500 500 502 500 502 500 500 500 500 500 500 502 502 500 502 500 500 500 502 500 500 500 500 500 500 500 500 500 502 500 500 500 500 500 500 500 500 502 500 500 502 502 500 500 502 500 500 500 502 502 500 500 500 502 500 502 500 500 502 500 500 500 502 500 500 500 500 500 500 502 500 500 500 500 502 502 500 502 500 500 502 500 500 500 500 500 500 500 502 500 502 500 502 500 500 502 500 500 500 500 502 502 500 500

it returns the wrong answer. It prints 500 but should be 501

Is there a error in my implementation, or is it my logic that is the problem?

Note: I am also having problems with time complexity, as when I am faced with a problem where n = 100000 and l = 50000.

CodePudding user response：

There are several errors in the implementation, e.g.

if n == 1 and l > 0:
    max_num = cite[0]   1

which is incorrect, since h-index is at most the number of papers with nonzero citations, so it would be 1 if l > 0 or cite[0] > 0, and 0 otherwise.

The specific problem leading to that input error is the assumption that the h-index must occur as a citation count; consider [4, 4, 4] which has h-index 3.

The other main issue is using list.count() in the main loop, rather than using a counter or other hash map. With a Counter, you don't need to even sort your array; just track how many papers remain unseen.

import collections
line = input().strip().split()
n, l = int(line[0]), int(line[1])
cite = [int(i) for i in input().strip().split()]

max_num = 0
citation_counts = collections.Counter(cite)
remaining = n


for i in range(n 1):

    extra = min(citation_counts[i - 1], l)
    if extra   remaining >= i:
        max_num = i

    remaining -= citation_counts[i]

print(max_num)