Home > Enterprise >  Capitalize specific indices of string using awk or python
Capitalize specific indices of string using awk or python

Time:04-21

I have an input file where each line contains 99 lowercase letters,

bccdddcdccddddddabcdabcabdbacbdcaaccbbcabacbccabcacbcdcccbdbacdcbbcbcbcccacadaaccababadbcbaabbbccbb 
bccdddcdcddddcddabcdabcabdbacbddaacdbbcabacbcdbbcacbcccccbdbacdbbbcbcbacbacacaacccbabadbcbaabbbccbb 
bccdddcdcddddccdabcdabcabdbacbddaaddbbcabacbcdbbcacbcccccbdbacdbbbcbcbaccacadaaccbbabadbccacbbbccbb 
bccdddcdccdddccdabcdabcdbdbacbdcaaddcbcabacbccabcacbcdcccbdbacdbbbcbcbbccacadaaccbbabadbccaaabbccbb 

I have a list of positions, for example p = [10, 14, 89, 99].

I'd like to capitalize the letters at these positions in my input file.

Desired output:

bccdddcdcCdddDddabcdabcabdbacbdcaaccbbcabacbccabcacbcdcccbdbacdcbbcbcbcccacadaaccababadbCbaabbbccbB 
bccdddcdcDdddCddabcdabcabdbacbddaacdbbcabacbcdbbcacbcccccbdbacdbbbcbcbacbacacaacccbabadbCbaabbccbB 
bccdddcdcDdddCcdabcdabcabdbacbddaaddbbcabacbcdbbcacbcccccbdbacdbbbcbcbaccacadaaccbbabadbCcacbbccbbB 
bccdddcdcCdddCcdabcdabcdbdbacbdcaaddcbcabacbccabcacbcdcccbdbacdbbbcbcbbccacadaaccbbabadbCcaaabbccbB 

I'm using this awk command:

awk -vFS= -vOFS= '{$10=toupper($10)}1' input > output

But I'm not sure how to loop this over all the positions.

CodePudding user response:

You can use a generator expression with .upper() and enumerate() to capitalize only the specified indices:

p = [10, 14, 89, 99] # or use set([10, 14, 89, 99]) for faster lookup
with open('in.txt') as file:
    for line in file:
        line = line.rstrip()
        result = ''.join(c.upper() if i   1 in p else c for i, c in enumerate(line))
        print(result)

This outputs:

bccdddcdcCdddDddabcdabcabdbacbdcaaccbbcabacbccabcacbcdcccbdbacdcbbcbcbcccacadaaccababadbCbaabbbccbB
bccdddcdcDdddCddabcdabcabdbacbddaacdbbcabacbcdbbcacbcccccbdbacdbbbcbcbacbacacaacccbabadbCbaabbbccbB
bccdddcdcDdddCcdabcdabcabdbacbddaaddbbcabacbcdbbcacbcccccbdbacdbbbcbcbaccacadaaccbbabadbCcacbbbccbB
bccdddcdcCdddCcdabcdabcdbdbacbdcaaddcbcabacbccabcacbcdcccbdbacdbbbcbcbbccacadaaccbbabadbCcaaabbccbB

CodePudding user response:

One awk idea:

awk -v p="10,14,89,99" '
BEGIN { split(p,arr,",") }
      { for (i in arr)
            $0=substr($0,0,arr[i]-1) toupper(substr($0,arr[i],1)) substr($0,arr[i] 1)
        print
      }
' input

This generates:

bccdddcdcCdddDddabcdabcabdbacbdcaaccbbcabacbccabcacbcdcccbdbacdcbbcbcbcccacadaaccababadbCbaabbbccbB
bccdddcdcDdddCddabcdabcabdbacbddaacdbbcabacbcdbbcacbcccccbdbacdbbbcbcbacbacacaacccbabadbCbaabbbccbB
bccdddcdcDdddCcdabcdabcabdbacbddaaddbbcabacbcdbbcacbcccccbdbacdbbbcbcbaccacadaaccbbabadbCcacbbbccbB
bccdddcdcCdddCcdabcdabcdbdbacbdcaaddcbcabacbccabcacbcdcccbdbacdbbbcbcbbccacadaaccbbabadbCcaaabbccbB
  • Related