Published 2020. 2. 16. 16:47

[Python/test]Alien dictonary. 단어모음을 보고 알파벳 순서 예측하기

#Problem_eng
While you were traveling in a spaceship, you visited an alien planet. Surprisingly in alien languages, they also use lowercase letters in English, but perhaps in a different order. Looking at the set of words you got from studying alien languages, you try to make an alien dictionary. You first want to define what order their alphabet is in. What kind of methods should you use?

#문제_한국어
당신은 우주선을 타고 여행하던 중 어떤 외계인 행성을 방문하게 되었다. 외계 언어에서 놀랍게도 그들은 또한 영어 소문자를 사용하지만, 아마도 다른 순서로 쓰이는 것으로 보인다. 외계어를 공부하며 얻은 단어 집합을 보고 너는 외계어 사전을 만드려고 한다. 당신은 우선 그들의 알파벳은 어떤 순서로 되어 있는지 정의하고자 한다. 어떤 방법을 사용하면 좋을까?

위 문제를 해결하기 위해서 몇 가지 조건이 필요합니다.

1. 단어 모음에 정렬된 단어의 순서는 중요하지 않지만 단어의 배열은 그들의 알파벳 순서를 따라야 합니다.

-> 예를 들어 b c a 라는 순서로 구성되어 있다면, [bc, a, ba, ca, bca]는 가능하지만 [ac, bc, bca] 는 불가능합니다.

2. 단어 중 각 알파벳이 회귀되는 경우 문제가 발생할 수 있습니다.

-> 가령 abcdba와 같이 알파벳을 배열할 때 사이클이 생기는 경우 에러가 발생하거나 올바른 답이 도출되지 않습니다.

import random
import string
import collections

def remove_overlab(list_input):
    result = []

    for idx in range(len(list_input)) :
        word = list(list_input[idx])
        word_not_overlab = []
        word_overlab = set()

        for alphabet in word :
            if alphabet not in word_overlab :
                word_not_overlab.append(alphabet)
                word_overlab.add(alphabet)

        remove_overlab = ''.join(word_not_overlab)
        result.append(remove_overlab)

    return resultPYTHON

remove overlab 함수는 단어들 내에 중복된 알파벳을 제거해주는 함수입니다. 알파벳을 하나씩 가져와 wor__overlab 리스트에 존재하는지 파악합니다. 존재하지 않는 경우 word_not_overlab과 word_overlab에 각각 저장한 뒤 다음 순서의 알파벳을 검사합니다.

이러한 검사를 반복하여 최종적으로 중복이 제거된 단어 모음의 리스트가 출력됩니다.

def ailen_wordset_dic(wordset):
    #wordset에 중복된 단어를 제거
    wordset_remove_overlab = list(set(remove_overlab(wordset)))
    sort_wordset = sorted(wordset_remove_overlab, key = lambda x: len(x))
    #중복이 제거되고 정렬된 외계인 단어를 dictionary형태로 저장
    sort_wordset_dic = collections.OrderedDict()

    for i in range(len(sort_wordset)):
        value = list(sort_wordset[i])
        key = value[0]
        if len(value) <= 1 :
            #key로 사용할 알파벳이 중복된 경우 알파벳을 하나 더 추가하여 구분시켜줌
            if key in sort_wordset_dic.keys():
                key = key + key
                sort_wordset_dic[key] = []
            else : 
                sort_wordset_dic[key] = []

        else :
            #key로 사용할 알파벳이 중복된 경우 알파벳을 하나 더 추가하여 구분시켜줌
            if key in sort_wordset_dic.keys():
                key = key + key
                sort_wordset_dic[key] = value[1:]
            else : 
                sort_wordset_dic[key] = value[1:]
        
    return sort_wordset, sort_wordset_dicPYTHON

단어 모음에서 중복을 제거한 단어를 가져와 단어의 첫 알파벳은 key로, 나머지를 value로 갖는 dictonary를 형성합니다.

가령 'abc' 는 {a : [b, c]} 로 변환되어 저장됩니다.

이때 key로 사용되는 단어의 첫 알파벳이 이미 등장했던 경우 알파벳을 하나 더 늘려주는 식으로 계속 저장해줍니다.

def find_order(list_in):
    answer = ''
    order_list = sorted(list_in, key = lambda x: len(x))
    
    for i in range(len(order_list)-1):
        w = order_list[i]
        nxt_w = order_list[i+1]

        #현재 순서도가 다음 순서도에 포함되는 경우 다음 순서도를 가져옴
        if nxt_w.find(w) != -1 :
            answer = nxt_w
        #현재 순서도가 다음 순서도에 포함되지 않는 경우 포함되지 않는 부분을 가져와서 붙혀줌
        else :
            for j in range(len(nxt_w)) :
                if nxt_w[j::].find(w) != -1 :
                    answer = nxt_w[:j+1] + w 
    
    return answerPYTHON

find_order 함수는 정렬된 리스트를 입력받아 구성된 각 단어들 간의 관계를 확인합니다.

현재 단어와 다음 단어를 비교하여 순서가 일치하는 경우 가져오고, 그렇지 않은 경우 일치하지 않는 부분을 추가하여 출력해줍니다.

<실행 예시>

# 초기 조건
1. 외계인은 (a,b,c,d,e,f,g,h,i) 만 사용한다.
2. 수집된 단어 집합은 무작위로 섞여 있다.

# 실험 조건
1. 테스트 워드셋은 10개의 단어로 구성된다.
2. 테스트의 알파벳 순서는 i h g f e d c b a 입니다.

def main():
    wordset = ['a', 'ba','ccba', 'dca', 'edba', 'feca', 'gfedcba','hgf','ih', 'hgec']
    sort_word, sort_word_dic = ailen_wordset_dic(wordset)

    box = []
    for v in sort_word_dic.keys():

        each_order = draw_ordermap(sort_word_dic, v)
        each_order = ''.join(each_order)
        box.append(each_order)

    order = find_order(box)

    print('주어진 단어 모음은', sort_word, '입니다')
    print('주어진 단어모음으로 추측한 알파벳 순서는', order,'입니다.')
PYTHON

https://github.com/Leo-bb/algorithm_problem/tree/master/Sample_problem

Leo-bb/algorithm_problem

Contribute to Leo-bb/algorithm_problem development by creating an account on GitHub.

github.com

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

[Python/test]Alien dictonary. 단어모음을 보고 알파벳 순서 예측하기

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역