| registrieren | anmelden | FAQ | [?] |
From words to corpora: recognizing translationby: Noah A Smith
(2002), pp. 95-102.
|
Reviews
[Write a review of this article]
There are no reviews of this article
Find related articles from these CiteULike users
Find related articles with these CiteULike tags
AbstractThis paper presents a technique for discovering translationally equivalent texts. It is comprised of the application of a matching algorithm at two different levels of analysis and a well-founded similarity score. This approach can be applied to any multilingual corpus using any kind of translation lexicon; it is therefore adaptable to varying levels of multilingual resource availability. Experimental results are shown on two tasks: a search for matching thirty-word segments in a corpus where some segments are mutual translations, and classification of candidate pairs of web pages that may or may not be translations of each other. The latter results compare competitively with previous, document-structure-based approaches to the same problem.
BibTeX record
RIS recordaacr abbreviations accessibility active-annotation adaptive-hypermedia adaptive-web adl aggregate-works ajax analytic-cataloging ancient-texts annotations apis application-development application-profiles arabic arabic_ocr archaeology archival-description archives artificial-intelligence association-rules authority authority_control authority_control--automated authority_files authority_files--international authorship_attribution automatic-gazetteer-construction automatic-hypertext-creation automatic-index-generation automatic-linking automatic_metadata_extraction automatic_metadata_generation automatic-taxonomy-generation bibliographic-data bibliographic_infrastructure bibliographic-relationships bibliography-management bibliometrics blogs books bootstrapping browsing cataloging cataloging--rules categorization--automatic cdl cervantes-digital-library cheshire_dl cidoc-crm citation_linking citeseer classification classification--automatic classification--text--automated clef clickstream_data clir cluster-analysis clustering clustering--evaluation coins collaboration collaborative_authoring collaborative_digitization collaborative_filtering collaborative-ontology-building collaborative_tagging collaborative_tagging--overview collection-analysis collection-development collection-development--automated collection-management collection-scope collective-intelligence communities communities-of-practice comparable-corpora computational-lexicons computational_linguistics computational_semiotics computer-science conditional-random-fields content-management-systems context controlled_vocabularies co-occurence_models copyright coreference_resolution corpora corpora--annotated corpora--historical corpus-linguistics cross-collection-access cts cultural-heritage customization cyberinfrastructure databases databases--fulltext data-curation data_grid data_integration data-mining data_models data-provenance data_sparsity data-transformation diagrams dictionaries dictionaries--electronic dictionaries--historical dictionaries--machine_readable digital-archives digital-classics digital_collections digital-editions digital-history digital_humanities digital_libraries digital_libraries--evaluation digital_libraries--historical digital_libraries--services digital_library_architectures digital_museums digital_objects digital-objects--reusability digital_preservation digital-reading digital-reference digital_repositories digital-scholarship digital-theses digital_tools digitization disambiguation--personal_names disambiguation--place_names distributed-moderation distributed-repositories document-analysis document-clustering document-image-retrieval document-layout-analysis document_models document-recognition document-recognition--historical document-structure dois domain-knowledge domain-learning dspace dtds dublin_core dynamic-programming ead e-books economics edition-alignment e-learning electronic-cultural-atlas-initiative electronic_publishing emergent_semantics encyclopedias entity-relationship-model entry-level-vocabularies e-prints e-science european-digital-library evaluation evaluation--methods event_extraction event-modeling external-evidence extractive-summarization faceted-browsing faceted-classification facets faculty fair-use feature-extraction federated-digital-libraries fedora filtering finding-aids foaf focused_crawler folksonomies for_gabe formal-concept-analysis frad franar frbr frbroo fuzzy-logic gate gazetteers gazetteers--digital gazetteers--time_periods genealogists genealogy genre-analysis geocoding geo-digitallibraries geographic-information_retrieval georeferencing gis gis--historical glossaries google google-analytics google_books google_maps graph-analysis grddl greek greenstone ground-truth-data gutenberg handles handwriting-recognition hci heml heuristics hierarchical-classification hierarchical_clustering historians historical-methods historical_newspapers historical_newspapers--research historic_newspaper_digitization history--ontology history--teaching hmms humanities hybrid-library hyperbooks hyperlinks hypertext identifiers image-annotation image-processing image_retrieval images image-segmentation implicit_ratings indecs indexing indexing--automatic inex information-access information_architecture information-behavior information-commons information_extraction information-filtering information-quality information_retrieval information_retrieval--evaluation information_retrieval--historical information-seeking--humanists infrastructure intellectual-property intelligent-systems intelligent-tutoring interactive-information_extraction interactive-machine-translation interactivity interdisciplinary interface-design interfaces interfaces--adaptive inter-indexer-consistency internal-evidence internet_archive internet-based-community-network internet_resources interoperability inverted-index ivia javascript jstor keyphrase_assignment keyphrase_extraction keyword-searching knowledge-acquisition knowledge-acquisition-bottleneck knowledge-bases knowledge-construction knowledge-discovery knowledge-elicitation knowledge-management knowledge-modeling knowledge-organization knowledge-representation knowledge-sources--historical kos labeling language_engineering language-learning language_models language_models--historical language-resources language-resources--geographical language_technologies latent-semantic-analysis latent-semantic-indexing latin lcc lcnaf lcsh leaders-project learning-objects learning-objects-metadata lexical-semantics lexicography lexicons lexicons-bilingual librarians librarianship libraries libraries--academic libraries--collections libraries--perceptions libraries--public libraries--scientific libraries--services library20 library_as_place library_catalogs library-information-systems library-outreach library_thing linguistic-markup linguistics linguists link-analysis linked-data linking link_mining link-servers literary-computing literature local-history local-history--sources logistic-regression long-tail lsch lucene machine_learning machine_learning--incremental machine_learning--semi-supervised machine_learning--supervised machine_learning--unsupervised machine-readable-dictionaries machine-translation machine-translation--evaluation machine-translation--statistical manuscripts maps maps--historic marc marc_xml markup mashups mass-collaboration mass_digitization massive-digital-libraries mead medieval-texts medline mental-models metadata metadata--aggregation metadata--applications metadata--creators metadata--evaluation metadata--genre metadata--geographic metadata--harvesting metadata--interoperability metadata--mapping metadata--overviews metadata--quality metadata--reuse metadata--schemas metadata--standards metadata--subject metadata-translation meta-searching mets microfilm microformats mods morphology movielens multi-document multi-document-summarization multilingual-collections multilingual-digital-libraries multi-lingual-document-clustering multilinguality multilingual-language-resources multilingual-text-retrieval multilingual-text-summarization multimedia multiple-alignment multiple-hierarchies music-information-retrieval mysql naco naive-bayes name_authorities--historical named-entities named_entities--historical named-entity-classification named-entity-disambiguation named-entity-extraction named-entity-recognition named-entity-research--overview named-entity-searching named-entity-tagging named-graphs name_modeling narrative-texts natural-history natural-language natural-language-processing ndnp newspaper-archive nextgen next-generation-catalogs n-grams nines nkos nomenclatures non-projective-dependency-parsing normalized-information-distance nsdl nsf oai-ore oai-pmh oaister obituaries object-oriented ockham oclc ocr online-communities ontological-indexing ontologies ontologies--alignment ontologies--domain ontologies--geographic ontologies--integration ontologies--learning ontologies--mapping ontologies--population ontologies--reuse ontologies--users ontology-based-information_extraction opacs open-access open-archival-information-systems open-archives open-content-alliance open-data open-search open-source-software open-url oral_history owl palaeography papyrology parallel-texts paraphrase-discovery parsing partnerships passage-retrieval pattern-classification pattern_learning pattern-recognition pdf peer-to-peer-digital_libraries perseus-biblio-import personal_digital_libraries personalization personalized-searching philology philosophy photographs pl place-name-recognition planet_math plsa-probablistic-latent-semantic-analysis plugins popular-history portability primary_sources primary_sources--digitization prior-knowledge project-planning proper-names public-domain public-history python quality-control quality-metrics query-disambiguation query-expansion query-expansion--semantic query-personalization query-processing query-reformulation query-rewriting query-translation question-answering quotation_identification random-walks ranking rankings rda rdf rdf-browsing rdf-xml reading reading-comprehension reading-purpose reading_support_system recommender_systems record_linkage reference reference_models reference-works--dynamic reference-works--online regular-expressions relational_databases relation_discovery relation_extraction relevance-ranking research-habits research-papers rewards rights-metadata rouge rss rule-based-learning rule-based-system scalability schema-mapping schema-matching schemas scholarly_communication scientific-reading scorm screen-scraping search-engine-coverage search-engine-optimization search-engines search-engines--results searching search-results-clustering search-results-display search-terms second-life self-organizing-maps self-supervised-learning semantic_annotation semantic_annotation--automatic semantic_annotation--interfaces semantic_browsing semantic-digital-libraries semantic_indexing semantic_integration semantic-interoperability semantic_metadata semantic_networks semantic_relations semantic_searching semantic_similarity semantic_tagging semantic_web semantic_web--applications semantic_web--metadata semantic_web--searching semantic-wikis semi-structured-information sentence-alignment sentence_classification sentence-extraction serials service_oriented_architecture services shakespeare shared-ontology shipbuilding similarity-metrics simile-project single-document-summarization skos social_bookmarking social-classification social-computing social-informatics social-navigation social-networking social_networks social-software social-theory sparql spatial-data spatial-data-infrastructure spatial-hypertext spatial-ranking special-collections spelling-variants spreadsheets sql sru sru-srw standards standoff-markup statistics stemma-reconstruction stemming string_matching structured_information structured-navigation structured-prediction students student-users stylesheets subject_headings sumerian summarization summarization--evaluation summarization--overview summarization--web_pages support_vector_machines sustainability svg swoogle synonyms syriac tabbed-browsing table-extraction table-recognition tables-of-contents tacit-knowledge tag_clouds tag-clustering task-analysis task-based-evaluation taxonomies taxonomy-alignment teachers teaching tei temporal_information terminology-extraction terminology_services text_alignment text-analysis text-annotation text-categorization text-collation text-data-mining text-filtering text_grid text-interaction text_mining text_mining--historical text-representation text-reuse text_segmentation text-structure text-summarization textual_encoding textual-entailment text-understanding text-versioning tf-idf tgn thesauri thesauri--conversion timelines topic-detection topic-knowledge topic-maps topic-modeling toponyms training-data transaction_log_analysis translation-alignment translation-corpora translation-identification treebanks triple-stores trust undergraduate_research union-catalog unstructured-text urban-history uris urns usability usage-data user-centered-design user_context user-contributed-metadata user_feedback user-feedback user-interest-modeling user-interfaces user-modeling user-motivation user-profiles user-studies validation vector-space-models vendors versioning viaf vicodi virtual_learning_environments visualization vocabulary-mapping web20 web_archiving web-corpora web-crawlers web-publishing web-searching web_services website-design web-usage-mining wikipedia wikipedia--evaluation wikipedia--mining wikipedia--nlp_resource wikis word-alignment wordnet word-sense-disambiguation word-spotting word-variants worldcat xml xml--databases xml-element-retrieval xml--query_languages xml--schemas xml-summarization xpath xslt z3950 zotero